Bootstrap confidence intervals

Case Studies in Statistical Thinking

Justin Bois

Lecturer, Caltech

EDA is the first step

"Exploratory data analysis can never be the whole story, but nothing else can serve as a foundation stone, as the first step."

--John Tukey

¹ Data courtesy of Avni Gandhi, Grigorios Oikonomou, and David Prober, Caltech

Optimal parameter value: The value of the parameter of a probability distribution that best describes the data

Optimal parameter for the Exponential distribution: Computed from the mean of the data

np.mean(nuclear_incident_times)

87.140350877192986

¹ Data source: Wheatley, Sovacool, and Sornette, Nuclear Events Database

A resampled array of the data

# Resample nuclear_incident_times with replacement
bs_sample = np.random.choice(
  nuclear_incident_times,
  replace=True,
  size=len(inter_times)
)

¹ Data source: Wheatley, Sovacool, and Sornette, Nuclear Events Database

¹ Data source: Wheatley, Sovacool, and Sornette, Nuclear Events Database

¹ Data source: Wheatley, Sovacool, and Sornette, Nuclear Events Database

¹ Data source: Wheatley, Sovacool, and Sornette, Nuclear Events Database

Bootstrap replicate: A statistic computed from a bootstrap sample

Function to draw bootstrap replicates from a dataset

# Draw 10000 replicates of the mean from
# nuclear_incident_times
bs_reps = dcst.draw_bs_reps(
  nuclear_incident_times, np.mean, size=10000
)

¹ Data source: Wheatley, Sovacool, and Sornette, Nuclear Events Database

If we repeated measurements over and over again, p% of the observed values would lie within the p% confidence interval

np.percentile(bs_reps, [2.5, 97.5])

array([  73.31505848,  102.39181287])

Case Studies in Statistical Thinking