Linear regressions and pairs bootstrap

Case Studies in Statistical Thinking

Justin Bois

Lecturer, Caltech

Bacterial growth

¹ Images courtesy of Jin Park and Michael Elowitz, Caltech

Bacterial growth

_ = plt.semilogy(t, bac_area, marker='.', linestyle='none')
_ = plt.xlabel('time (hr)')
_ = plt.ylabel('area (sq. µm)')
plt.show()

Linear regression with np.polyfit()

slope, intercept = np.polyfit(t, bac_area, 1)

t_theor = np.array([0, 14])
bac_area_theor = slope * t_theor + intercept

_ = plt.plot(t, bac_area, marker='.', linestyle='none')
_ = plt.plot(t_theor, bac_area_theor)
_ = plt.xlabel('time (hr)')
_ = plt.ylabel('area (sq. µm)')
plt.show()

Regression of bacterial growth

Semilog-linear regression with np.polyfit()

slope, intercept = np.polyfit(t, np.log(bac_area), 1)

t_theor = np.array([0, 14])
bac_area_theor = np.exp(slope * t_theor + intercept)

_ = plt.semilogy(t, bac_area, marker='.', linestyle='none')
_ = plt.semilogy(t_theor, bac_area_theor)
_ = plt.xlabel('time (hr)')
_ = plt.ylabel('area (sq. µm)')
plt.show()

Regression of bacterial growth

Pairs bootstrap

Resample data in pairs
Compute slope and intercept from resampled data
Each slope and intercept is a bootstrap replicate
Compute confidence intervals from percentiles of bootstrap replicates

Pairs bootstrap

# Draw 10000 pairs bootstrap reps
slope_reps, int_reps = dcst.draw_bs_pairs_linreg(
  x_data, y_data, size=10000
)


# Compute 95% confidence interval of slope
slope_conf_int = np.percentile(slope_reps, [2.5, 97.5])

Let's practice!

Case Studies in Statistical Thinking