Resampling as a special type of Monte Carlo simulation

Monte Carlo Simulations in Python

Izzy Weber

Curriculum Manager, DataCamp

Resampling as a special type of Monte Carlo simulation

 

Monte Carlo simulations

  • Sample from probability distributions
  • Distributions either known or assumed
  • Rely on historical data or expertise to choose proper distributions

 

Resampling

  • Sample randomly from existing data
  • Existing data is implicit probability distribution
  • Assume that data is representative
Monte Carlo Simulations in Python

Resampling methods

  1. Sampling without replacement
    • Used to draw a random sample
  2. Sampling with replacement (or bootstrapping)
    • Use to estimate the sampling distribution of almost any statistic
  3. Permutation
    • Often used to compare two groups
Monte Carlo Simulations in Python

Sampling without replacement

Randomly draw two different states of the six states in New England

import random
def two_random_ne_states():

ne_states=["Maine", "Vermont", "New Hampshire", "Massachusetts", "Connecticut", "Rhode Island"]
return(random.sample(ne_states, 2))

 

 

two_random_ne_states()
two_random_ne_states()
['Massachusetts', 'Connecticut']
['New Hampshire', 'Maine']
Monte Carlo Simulations in Python

Bootstrapping

Estimate the 95% confidence interval for the mean height of NBA players

import random
import numpy as np

nba_heights = [196, 191, 198, 216, 188, 185, 211, 201,
               188, 191, 201, 208, 191, 183, 196]
simu_heights = []

for i in range(1000): bootstrap_sample = random.choices(nba_heights, k=15) simu_heights.append(np.mean(bootstrap_sample))
upper = np.quantile(simu_heights, 0.975) lower = np.quantile(simu_heights, 0.025) print([np.mean(simu_heights), lower, upper])
[196.26666666666668, 191.8, 201.2]
Monte Carlo Simulations in Python

Visualization of bootstrap results

Plotting libraries:

  • seaborn
  • matplotlib
import seaborn as sns
import matplotlib.pyplot as plt

sns.displot(simu_heights)
plt.axvline(191.8, color="red")
plt.axvline(201.2, color="red")
plt.axvline(196.3, color="green")

 

a distribution plot of simulated heights

Monte Carlo Simulations in Python

Permutation

Estimate 95% confidence interval of the mean difference between heights of NBA players and US males

us_heights = [165, 185, 179, 187, 193, 180, 178, 179, 171, 176, 
              169, 160, 140, 199, 176, 185, 175, 196, 190, 176]
nba_heights = [196, 191, 198, 216, 188, 185, 211, 201, 188, 191, 201, 208, 191, 183, 196]

all_heights = us_heights + nba_heights
simu_diff = [] for i in range(1000): perm_sample = np.random.permutation(all_heights) perm_nba, perm_adult = perm_sample[0:15], perm_sample[15:35]
perm_diff = np.mean(perm_nba) - np.mean(perm_adult) simu_diff.append(perm_diff)
Monte Carlo Simulations in Python

Permutation results

Difference in mean of NBA heights and adult American male heights:

np.mean(nba_heights) - np.mean(us_adult_height)
18.31666666666669

95% confidence interval for permutation of two random lists:

upper = np.quantile(simu_diff, 0.975)
lower = np.quantile(simu_diff, 0.025)
print([lower, upper])
[-10.033333333333331, 10.033333333333331]
Monte Carlo Simulations in Python

Visualizing permutation results

sns.distplot(simu_diff)
plt.axvline(-10.03, color="red")
plt.axvline(10.03, color="red")
plt.axvline(18.32, color="green")

a distribution plot of simulated heights

Monte Carlo Simulations in Python

Let's practice!

Monte Carlo Simulations in Python

Preparing Video For Download...