Pairs bootstrap

Statistical Thinking in Python (Part 2)

Justin Bois

Lecturer at the California Institute of Technology

Nonparametric inference

Make no assumptions about the model or probability distribution underlying the data

2008 US swing state election results

ch2-3.004.png

¹ Data retrieved from Data.gov (https://www.data.gov/)

Pairs bootstrap for linear regression

Resample data in pairs
Compute slope and intercept from resampled data
Each slope and intercept is a bootstrap replicate
Compute confidence intervals from percentiles of bootstrap replicates

Generating a pairs bootstrap sample

np.arange(7)

array([0, 1, 2, 3, 4, 5, 6])

inds = np.arange(len(total_votes))

bs_inds = np.random.choice(inds, len(inds))

bs_total_votes = total_votes[bs_inds]
bs_dem_share = dem_share[bs_inds]

Computing a pairs bootstrap replicate

bs_slope, bs_intercept = np.polyfit(bs_total_votes, 
                                    bs_dem_share, 1)

bs_slope, bs_intercept

(3.9053605692223672e-05, 40.387910131803025)

np.polyfit(total_votes, dem_share, 1)  # fit of original

array([  4.03707170e-05,   4.01139120e+01])

2008 US swing state election results

ch2-3.022.png

¹ Data retrieved from Data.gov (https://www.data.gov/)

Let's practice!

Statistical Thinking in Python (Part 2)