Hypothesis Testing in Python
James Chapman
Curriculum Manager, DataCamp
age_first_code_cut classifies when Stack Overflow user first started programming"adult" means they started at 14 or older"child" means they started before 14A hypothesis is a statement about an unknown population parameter
A hypothesis test is a test of two competing hypotheses
The null hypothesis ($H_{0}$) is the existing idea
The alternative hypothesis ($H_{A}$) is the new "challenger" idea of the researcher
For our problem:
Significance level is "beyond a reasonable doubt" for hypothesis testing

Hypothesis tests check if the sample statistics lie in the tails of the null distribution
| Test | Tails |
|---|---|
| alternative different from null | two-tailed |
| alternative greater than null | right-tailed |
| alternative less than null | left-tailed |
$H_{A}$: The proportion of data scientists starting programming as children is greater than 35%
This is a right-tailed test

p-values: probability of obtaining a result, assuming the null hypothesis is true
prop_child_samp = (stack_overflow['age_first_code_cut'] == "child").mean()
0.39141972578505085
prop_child_hyp = 0.35
std_error = np.std(first_code_boot_distn, ddof=1)
0.010351057228878566
z_score = (prop_child_samp - prop_child_hyp) / std_error
4.001497129152506
norm.cdf() is normal CDF from scipy.stats.norm.cdf().1 - norm.cdf().
from scipy.stats import norm
1 - norm.cdf(z_score, loc=0, scale=1)
3.1471479512323874e-05
Hypothesis Testing in Python