Confidence intervals

Sampling in Python

James Chapman

Curriculum Manager, DataCamp

Confidence intervals

  • "Values within one standard deviation of the mean" includes a large number of values from each of these distributions
  • We'll define a related concept called a confidence interval
Sampling in Python

Predicting the weather

  • Rapid City, South Dakota in the United States has the least predictable weather
  • Our job is to predict the high temperature there tomorrow

A map of the weather, with colors indicating how predictable regions are.

Sampling in Python

Our weather prediction

  • Point estimate = 47°F (8.3°C)
  • Range of plausible high temperature values = 40 to 54°F (4.4 to 12.8°C)
Sampling in Python

We just reported a confidence interval!

  • 40 to 54°F is a confidence interval
  • Sometimes written as 47 °F (40°F, 54°F) or 47°F [40°F, 54°F]
  • ... or, 47 ± 7°F
  • 7°F is the margin of error
Sampling in Python

Bootstrap distribution of mean flavor

import matplotlib.pyplot as plt
plt.hist(coffee_boot_distn, bins=15)
plt.show()

A histogram of mean coffee flavor.

Sampling in Python

Mean of the resamples

import numpy as np
np.mean(coffee_boot_distn)
7.513452892

A histogram of mean coffee flavor with the mean indicated by a vertical black bar.

Sampling in Python

Mean plus or minus one standard deviation

np.mean(coffee_boot_distn)
7.513452892
np.mean(coffee_boot_distn) - np.std(coffee_boot_distn, ddof=1)
7.497385709174466
np.mean(coffee_boot_distn) + np.std(coffee_boot_distn, ddof=1)
7.529520074825534

A histogram of coffee flavor means with mean and standard deviations indicated by vertical bars.

Sampling in Python

Quantile method for confidence intervals

np.quantile(coffee_boot_distn, 0.025)
7.4817195
np.quantile(coffee_boot_distn, 0.975)
7.5448805

A 95 percent confidence interval line.

Sampling in Python

Inverse cumulative distribution function

  • PDF: The bell curve
  • CDF: integrate to get area under bell curve
  • Inv. CDF: flip x and y axes

Implemented in Python with

from scipy.stats import norm
norm.ppf(quantile, loc=0, scale=1)

Inverse cumulative distribution function.

Sampling in Python

Standard error method for confidence interval

point_estimate = np.mean(coffee_boot_distn)
7.513452892
std_error = np.std(coffee_boot_distn, ddof=1)
0.016067182825533724
from scipy.stats import norm
lower = norm.ppf(0.025, loc=point_estimate, scale=std_error)
upper = norm.ppf(0.975, loc=point_estimate, scale=std_error)
print((lower, upper))
(7.481961792328933, 7.544943991671067)
Sampling in Python

Let's practice!

Sampling in Python

Preparing Video For Download...