The normal distribution

Introduction to Statistics in Python

Maggie Matsui

Content Developer, DataCamp

What is the normal distribution?

Density function of normal distribution

Introduction to Statistics in Python

Symmetrical

Dashed vertical line down the middle of normal distribution

Introduction to Statistics in Python

Area = 1

Normal distribution with area underneath curve shaded

Introduction to Statistics in Python

Curve never hits 0

Normal distribution with arrows pointing to tails on either side

Introduction to Statistics in Python

Described by mean and standard deviation

 

                                                         Mean: 20

                                        Standard deviation: 3

Normal distribution with mean 20 and sd 3

                         Standard normal distribution

                                                         Mean: 0

                                        Standard deviation: 1

Normal distribution with mean 0 and sd 1

Introduction to Statistics in Python

Described by mean and standard deviation

 

                                                         Mean: 20

                                        Standard deviation: 3

Normal distribution with mean 20 and sd 3

                         Standard normal distribution

                                                         Mean: 0

                                        Standard deviation: 1

Normal distribution with mean 0 and sd 1

Introduction to Statistics in Python

Areas under the normal distribution

68% falls within 1 standard deviation

Normal distribution with area between -1 and 1 highlighted, labeled with 68%

Introduction to Statistics in Python

Areas under the normal distribution

95% falls within 2 standard deviations

Normal distribution with area between -2 and 2 highlighted, labeled with 95%

Introduction to Statistics in Python

Areas under the normal distribution

99.7% falls within 3 standard deviations

Normal distribution with area between -3 and 3 highlighted, labeled with 99.7%

Introduction to Statistics in Python

Lots of histograms look normal

Normal distribution

Standard normal distribution

Women's heights from NHANES

Histogram of women's heights

Mean: 161 cm        Standard deviation: 7 cm

Introduction to Statistics in Python

Approximating data with the normal distribution

Normal curve drawn over the histogram of women's heights

Introduction to Statistics in Python

What percent of women are shorter than 154 cm?

Normal curve drawn over the histogram of women's heights with area less than 154 shaded

16% of women in the survey are shorter than 154 cm

from scipy.stats import norm
norm.cdf(154, 161, 7)
0.158655
Introduction to Statistics in Python

What percent of women are taller than 154 cm?

Normal curve drawn over the histogram of women's heights with area to the right of 154 shaded

from scipy.stats import norm
1 - norm.cdf(154, 161, 7)
0.841345
Introduction to Statistics in Python

What percent of women are 154-157 cm?

Area less than 157 minus area less than 154

norm.cdf(157, 161, 7) - norm.cdf(154, 161, 7)
Introduction to Statistics in Python

What percent of women are 154-157 cm?

Area less than 157 minus area less than 154 equals area between 154 and 157

norm.cdf(157, 161, 7) - norm.cdf(154, 161, 7)
0.1252
Introduction to Statistics in Python

What height are 90% of women shorter than?

Area less than 170 shaded, labeled 90%

norm.ppf(0.9, 161, 7)
169.97086
Introduction to Statistics in Python

What height are 90% of women taller than?

Area greater than about 162 shaded, labeled 90%

norm.ppf((1-0.9), 161, 7)
152.029
Introduction to Statistics in Python

Generating random numbers

# Generate 10 random heights
norm.rvs(161, 7, size=10)
array([155.5758223 , 155.13133235, 160.06377097, 168.33345778,
       165.92273375, 163.32677057, 165.13280753, 146.36133538,
       149.07845021, 160.5790856 ])
Introduction to Statistics in Python

Let's practice!

Introduction to Statistics in Python

Preparing Video For Download...