Normality tests

Foundations of Inference in Python

Paul Savala

Assistant Professor of Mathematics

Height of US males

A histogram that is approximately normally distributed, with a mean height of 180 centimeters, a minimum height of 160 centimeters, and a maximum height of 200 centimeters.

Foundations of Inference in Python

Model residuals

A scatter plot with years on employment on the x-axis, annual salary on the y-axis, and a generally positive linear trend. A red line of best fit is also drawn on the data.

Expect equal distribution above and below prediction

Foundations of Inference in Python

Model residuals

A histogram with "residual (error)" on the x-axis, "count" on the y-axis, and a bimodal distribution with a mode around negative ten thousand, and another mode around positive thirty thousand.

Foundations of Inference in Python

Applications of normal distributions

  • Parametric tests - Hypothesis tests assuming normality
  • T-test for comparing means:
    • Assumes sample means are normally distributed
    • If not, conclusions are invalid
Foundations of Inference in Python

A histogram with salaries between sixty thousand and ninety five thousand on the x-axis, and frequency on the y-axis. The histogram is relatively close to normal.

Foundations of Inference in Python

Anderson-Darling test for normality

  • Tests assumption of normality

$H_0$: Data is normally distributed

$H_a$: Data is not normally distributed

Foundations of Inference in Python

Anderson-Darling test in SciPy

result = stats.anderson(police_df['Annual Salary'])

result.statistic
27.41
result.critical_values
[0.574, 0.654, 0.784, 0.915, 1.088]
result.significance_level[result.statistic > result.critical_values]
[15.  10.   5.   2.5  1. ]
Foundations of Inference in Python

Fitting a normal distribution

mu, std = stats.norm.fit(police_df['Annual Salary'])

estimated_pct_under_70k = stats.norm.cdf(70000, loc=mu, scale=std)
print(estimated_pct_under_70k)
0.27
actual_under_70k = police_df[police_df['Annual Salary'] < 70000]

print(len(actual_under_70k) / len(police_df))
0.20
Foundations of Inference in Python

Let's practice!

Foundations of Inference in Python

Preparing Video For Download...