p-values

Hypothesis Testing in Python

James Chapman

Curriculum Manager, DataCamp

Criminal trials

Two possible true states:
1. Defendant committed the crime
2. Defendant did not commit the crime
Two possible verdicts:
1. Guilty
2. Not guilty
Initially the defendant is assumed to be not guilty
Prosecution must present evidence "beyond reasonable doubt" for a guilty verdict

age_first_code_cut classifies when Stack Overflow user first started programming
- "adult" means they started at 14 or older
- "child" means they started before 14
Previous research: 35% of software developers started programming as children
Evidence that a greater proportion of data scientists starting programming as children?

A hypothesis is a statement about an unknown population parameter

A hypothesis test is a test of two competing hypotheses

The null hypothesis ($H_{0}$) is the existing idea
The alternative hypothesis ($H_{A}$) is the new "challenger" idea of the researcher

For our problem:

$H_{0}$: The proportion of data scientists starting programming as children is 35%
$H_{A}$: The proportion of data scientists starting programming as children is greater than 35%

¹ "Naught" is British English for "zero". For historical reasons, "H-naught" is the international convention for pronouncing the null hypothesis.

Either $H_{A}$ or $H_{0}$ is true (not both)
Initially, $H_{0}$ is assumed to be true
The test ends in either "reject $H_{0}$" or "fail to reject $H_{0}$"
If the evidence from the sample is "significant" that $H_{A}$ is true, reject $H_{0}$, else choose $H_{0}$

Significance level is "beyond a reasonable doubt" for hypothesis testing

Density plot of the pdf of the standard normal distribution with the left and right tails highlighted in red.

Hypothesis tests check if the sample statistics lie in the tails of the null distribution

Test	Tails
alternative different from null	two-tailed
alternative greater than null	right-tailed
alternative less than null	left-tailed

$H_{A}$: The proportion of data scientists starting programming as children is greater than 35%

This is a right-tailed test

p-values: probability of obtaining a result, assuming the null hypothesis is true

Large p-value, large support for $H_{0}$
- Statistic likely not in the tail of the null distribution
Small p-value, strong evidence against $H_{0}$
- Statistic likely in the tail of the null distribution
"p" in p-value → probability
"small" means "close to zero"

prop_child_samp = (stack_overflow['age_first_code_cut'] == "child").mean()

0.39141972578505085

prop_child_hyp = 0.35

std_error = np.std(first_code_boot_distn, ddof=1)

0.010351057228878566

z_score = (prop_child_samp - prop_child_hyp) / std_error

4.001497129152506

from scipy.stats import norm
1 - norm.cdf(z_score, loc=0, scale=1)

3.1471479512323874e-05

Hypothesis Testing in Python