Statistical significance

Hypothesis Testing in Python

James Chapman

Curriculum Manager, DataCamp

p-value recap

p-values quantify evidence for the null hypothesis
Large p-value → fail to reject null hypothesis
Small p-value → reject null hypothesis
Where is the cutoff point?

Significance level

The significance level of a hypothesis test ($\alpha$) is the threshold point for "beyond a reasonable doubt"

Common values of $\alpha$ are 0.2, 0.1, 0.05, and 0.01
If $p \le \alpha$, reject $H_{0}$, else fail to reject $H_{0}$
$\alpha$ should be set prior to conducting the hypothesis test

Calculating the p-value

alpha = 0.05

prop_child_samp = (stack_overflow['age_first_code_cut'] == "child").mean()
prop_child_hyp = 0.35

std_error = np.std(first_code_boot_distn, ddof=1)

z_score = (prop_child_samp - prop_child_hyp) / std_error

p_value = 1 - norm.cdf(z_score, loc=0, scale=1)

3.1471479512323874e-05

Making a decision

alpha = 0.05

print(p_value)

3.1471479512323874e-05

p_value <= alpha

True

Reject $H_{0}$ in favor of $H_{A}$

Confidence intervals

For a significance level of $\alpha$, it's common to choose a confidence interval level of 1 - $\alpha$

$\alpha=0.05$ → $95\%$ confidence interval

import numpy as np
lower = np.quantile(first_code_boot_distn, 0.025)
upper = np.quantile(first_code_boot_distn, 0.975)
print((lower, upper))

(0.37063246351172047, 0.41132242370632466)

Types of errors

	Truly didn't commit crime	Truly committed crime
Verdict not guilty	correct	they got away with it
Verdict guilty	wrongful conviction	correct

	actual $H_{0}$	actual $H_{A}$
chosen $H_{0}$	correct	false negative
chosen $H_{A}$	false positive	correct

False positives are Type I errors; false negatives are Type II errors.

Possible errors in our example

If $p \le \alpha$, we reject $H_{0}$:

A false positive (Type I) error: data scientists didn't start coding as children at a higher rate

If $ p \gt \alpha$, we fail to reject $H_{0}$:

A false negative (Type II) error: data scientists started coding as children at a higher rate

Let's practice!

Hypothesis Testing in Python