Bootstrapping

Foundations of Inference in Python

Paul Savala

Assistant Professor of Mathematics

Bootstrapping

  • Bootstrapping = Sampling with replacement
    1. Randomly choose a sample
    2. Write it down
    3. Put it back in the data (replacement)
    4. Repeat
  • Bootstrapped sample = Sample generated from bootstrapping
Foundations of Inference in Python

Non-parametric confidence interval

  • Non-parametric analogue of stats.norm.interval
    • Sample with replacement
    • Compute test statistic
    • Record it
    • Repeat
  • Creates an empirical distribution
Foundations of Inference in Python
salaries_df['Years of Employment']
[6, 11, 14, 3, 2, ...]
sample_1 = salaries_df['Years of Employment'].sample(n=10)

print(max(sample_1) - min(sample_1))
7
  • Repeat this process many times
  • Middle 95% of outcomes = 95% bootstrapped confidence interval
Foundations of Inference in Python
# Statistic function
def max_min(x):
    return max(x) - min(x)

# Data as a tuple data = (salaries_df['Years of Employment'], )
bootstrap_ci = stats.bootstrap(data, max_min, vectorized=False, n_resamples=1000)
print(bootstrap_ci)
BootstrapResult(confidence_interval=ConfidenceInterval(low=33.0, high=38.0),
standard_error=1.3843971812870597)
Foundations of Inference in Python

Normal confidence intervals

 

  • Requires data to be normally distributed
  • Computed based only on mean and standard error
  • Inference valid only for normal data
  • Very fast to compute

Bootstrap confidence intervals

 

  • Allows for any distribution
  • Computed directly from data by resampling
  • Inference valid for any data
  • Much slower to compute
Foundations of Inference in Python

Use cases for bootstrapping

  • When working with non-normal data
    • Ranked data
    • Skewed data
  • When normal confidence intervals return questionable values
  • Work with any statistic we like
Foundations of Inference in Python

Let's practice!

Foundations of Inference in Python

Preparing Video For Download...