Confidence intervals and sampling

Foundations of Inference in Python

Paul Savala

Assistant Professor of Mathematics

What is a confidence interval?

  • Uses samples to generate range of values
  • Range of values estimate the population statistic

Example:

  • Sample of 100 employees
  • Mean salary of $80,000
  • Standard deviation of $10,000

A confidence interval with seventy-eight thousand and forty dollars on the left, eighty-one thousand, nine hundred and fifty nine dollars on the right, and eighty thousand dollars in the middle.

Foundations of Inference in Python

Calculating a confidence interval

from scipy import stats
import numpy as np


ci = stats.norm.interval(loc=80000, # Mean
scale=10000/np.sqrt(100), # Standard error
alpha=0.95) # Confidence level
print(ci)
(78040.04, 81959.96)

Valid inference requires a normal sampling distribution

Foundations of Inference in Python

Central Limit Theorem

  • Average many independent samples
  • Sampling distribution is approximately normal
population = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
sample_means = []

for i in range(1000):
sample_5 = np.random.choice(population, size=5)
sample_means.append(sample_5.mean())
Foundations of Inference in Python
plt.hist(sample_means)

A histogram with "mean of sample" on the x-axis, "occurrences" on the y-axis, a title of "sampling distribution", and the histogram approximating a normal distribution centered at five.

Foundations of Inference in Python

A large city with a mix of high rise buildings and smaller rundown houses.

Foundations of Inference in Python

What a confidence interval tells us

(and what it doesn't tell us)

  • Population statistic is or is not in confidence interval
  • Repeated samples -> 95% of confidence intervals contain population statistic
Foundations of Inference in Python

Let's practice!

Foundations of Inference in Python

Preparing Video For Download...