Bootstrap confidence intervals

Casestudies in statistisch denken

Justin Bois

Lecturer, Caltech

EDA is the first step

"Exploratory data analysis can never be the whole story, but nothing else can serve as a foundation stone, as the first step."

--John Tukey

Casestudies in statistisch denken

Active bout length ECDFs

1 Data courtesy of Avni Gandhi, Grigorios Oikonomou, and David Prober, Caltech
Casestudies in statistisch denken

Optimal parameter value

  • Optimal parameter value: The value of the parameter of a probability distribution that best describes the data

  • Optimal parameter for the Exponential distribution: Computed from the mean of the data
Casestudies in statistisch denken
np.mean(nuclear_incident_times)
87.140350877192986

1 Data source: Wheatley, Sovacool, and Sornette, Nuclear Events Database
Casestudies in statistisch denken

Bootstrap sample

A resampled array of the data

# Resample nuclear_incident_times with replacement
bs_sample = np.random.choice(
  nuclear_incident_times,
  replace=True,
  size=len(inter_times)
)
Casestudies in statistisch denken

Bootstrap replicates

1 Data source: Wheatley, Sovacool, and Sornette, Nuclear Events Database
Casestudies in statistisch denken

Bootstrap replicates

1 Data source: Wheatley, Sovacool, and Sornette, Nuclear Events Database
Casestudies in statistisch denken

Bootstrap replicates

1 Data source: Wheatley, Sovacool, and Sornette, Nuclear Events Database
Casestudies in statistisch denken

Bootstrap replicates

1 Data source: Wheatley, Sovacool, and Sornette, Nuclear Events Database
Casestudies in statistisch denken

Bootstrap replicates

Bootstrap replicate: A statistic computed from a bootstrap sample

Casestudies in statistisch denken

dcst.draw_bs_reps()

Function to draw bootstrap replicates from a dataset

# Draw 10000 replicates of the mean from
# nuclear_incident_times
bs_reps = dcst.draw_bs_reps(
  nuclear_incident_times, np.mean, size=10000
)
Casestudies in statistisch denken

The bootstrap confidence interval

1 Data source: Wheatley, Sovacool, and Sornette, Nuclear Events Database
Casestudies in statistisch denken

The bootstrap confidence interval

If we repeated measurements over and over again, p% of the observed values would lie within the p% confidence interval

Casestudies in statistisch denken

The bootstrap confidence interval

np.percentile(bs_reps, [2.5, 97.5])
array([  73.31505848,  102.39181287])
Casestudies in statistisch denken

Let's practice!

Casestudies in statistisch denken

Preparing Video For Download...