Sampling and bias

Foundations of Inference in Python

Paul Savala

Assistant Professor of Mathematics

Bias

  • Biased sample: A group occurs more/less often in sample than in population

A group of people with different colored shirts, but a sample containing only people in green shirts.

Foundations of Inference in Python

Biased samples

all_salaries = [75000, 82000, ...]
friends_salaries = [93000, 87000, 103000, 101000]

np.mean(friends_salaries)
96000
Foundations of Inference in Python

Sampling distribution

sampling_distribution = []

for i in range(100):
random_sample = np.random.choice(salaries, size=10) sample_mean = np.mean(random_sample)
sampling_distribution.append(sample_mean)
plt.hist(sampling_distribution) plt.xlabel('Mean salary') plt.ylabel('Percent of samples') plt.title('Sampling distribution of mean salaries') plt.show()
Foundations of Inference in Python

Histogram showing the sampling distribution of mean salaries. It is a rough bell curve centered at around eight two thousand dollars, with a minimum around seventy thousand dollars and a maximum around ninety five thousand dollars.

Foundations of Inference in Python

Depends on the sample

  • Samples affect point estimates
  • Point estimates affect inference
  • Samples affect p-value calculations
Foundations of Inference in Python

Doesn't depend on the sample

  • Population statistic
    • Is unaffected by sample chosen
  • Conclusion from test
    • Given a p-value, conclusion is unaffected by sample chosen
Foundations of Inference in Python

Let's practice!

Foundations of Inference in Python

Preparing Video For Download...