Bootstrap hypothesis tests

Statistical Thinking in Python (Part 2)

Justin Bois

Lecturer at the California Institute of Technology

Pipeline for hypothesis testing

Clearly state the null hypothesis
Define your test statistic
Generate many sets of simulated data assuming the null hypothesis is true
Compute the test statistic for each simulated data set
The p-value is the fraction of your simulated data sets for which the test statistic is at least as extreme as for the real data

Michelson and Newcomb: speed of light pioneers

ch3-3.008.png

¹ Michelson image: public domain, Smithsonian ² Newcomb image: US Library of Congress

Michelson and Newcomb: speed of light pioneers

ch3-3.010.png

¹ Michelson image: public domain, Smithsonian ² Newcomb image: US Library of Congress

The data we have

ch3-3.011.png

¹ Data: Michelson, 1880

Null hypothesis

The true mean speed of light in Michelson’s experiments was actually Newcomb's reported value

Shifting the Michelson data

newcomb_value = 299860  # km/s
michelson_shifted = michelson_speed_of_light \\
           - np.mean(michelson_speed_of_light) + newcomb_value

ch3-3.019.png

Calculating the test statistic

def diff_from_newcomb(data, newcomb_value=299860):
    return np.mean(data) - newcomb_value

diff_obs = diff_from_newcomb(michelson_speed_of_light)

diff_obs

-7.5999999999767169

Computing the p-value

bs_replicates = draw_bs_reps(michelson_shifted,
                             diff_from_newcomb, 10000)

p_value = np.sum(bs_replicates <= diff_observed) / 10000

p_value

0.16039999999999999

One sample test

- Compare one set of data to a single number

Two sample test

- Compare two sets of data

Let's practice!

Statistical Thinking in Python (Part 2)