Hypothesis Testing in Python
James Chapman
Curriculum Manager, DataCamp
The samples are random subsets of larger populations
Each observation (row) in the dataset is independent
The sample is big enough to mitigate uncertainty, so that the Central Limit Theorem applies
$n \ge 30$
$n$: sample size
$n_{1} \ge 30, n_{2} \ge 30$
$n_{i}$: sample size for group $i$
Number of rows in our data $\ge 30$
$n_{i} \ge 30$ for all values of $i$
$n \times \hat{p} \ge 10$
$n \times (1 - \hat{p}) \ge 10$
$n$: sample size
$\hat{p}$: proportion of successes in sample
$n_{1} \times \hat{p}_{1} \ge 10$
$n_{2} \times \hat{p}_{2} \ge 10$
$n_{1} \times (1 - \hat{p}_{1}) \ge 10$
$n_{2} \times (1 - \hat{p}_{2}) \ge 10$
$n_{i} \times \hat{p}_{i} \ge 5$ for all values of $i$
$n_{i} \times (1 - \hat{p}_{i}) \ge 5$ for all values of $i$
$n_{i}$: sample size for group $i$
$\hat{p}_{i}$: proportion of successes in sample group $i$
If the bootstrap distribution doesn't look normal, assumptions likely aren't valid
Hypothesis Testing in Python