The central limit theorem

Introduction to Statistics in R

Maggie Matsui

Content Developer, DataCamp

Rolling the dice 5 times

die <- c(1, 2, 3, 4, 5, 6)

# Roll 5 times sample_of_5 <- sample(die, 5, replace = TRUE) sample_of_5
1 3 4 1 1
mean(sample_of_5)
2.0

 

six sided die

Introduction to Statistics in R

Rolling the dice 5 times

# Roll 5 times and take mean
sample(die, 5, replace = TRUE) %>% mean()
4.4
sample(die, 5, replace = TRUE) %>% mean()
3.8
Introduction to Statistics in R

Rolling the dice 5 times 10 times

Repeat 10 times:

  • Roll 5 times
  • Take the mean

 

sample_means <- replicate(10, sample(die, 5, replace = TRUE) %>% mean())

sample_means
3.8 4.0 3.8 3.6 3.2 4.8 2.6 3.0 2.6 2.0
Introduction to Statistics in R

Sampling distributions

Sampling distribution of the sample mean

histogram of 10 sample means

Introduction to Statistics in R

100 sample means

replicate(100, sample(die, 5, replace = TRUE) %>% mean())
2.8 3.2 1.8 4.6 4.0 2.8 4.4 2.4 3.4 2.8 4.2 3.4 ... 2.2 3.8 3.6 3.8 4.4 4.8 2.4

histogram of 100 sample means

Introduction to Statistics in R

1000 sample means

sample_means <- replicate(1000, sample(die, 5, replace = TRUE) %>% mean())

histogram of 1000 sample means

Introduction to Statistics in R

Central limit theorem

The sampling distribution of a statistic becomes closer to the normal distribution as the number of trials increases.

histograms of 10, 100, and 1000 sample means, where higher number of sample means has a more bell-curve shaped distribution

 

* Samples should be random and independent

Introduction to Statistics in R

Standard deviation and the CLT

replicate(1000, sample(die, 5, replace = TRUE) %>% sd())

Distribution of 1000 sample standard deviations of 5 die rolls

Introduction to Statistics in R

Proportions and the CLT

sales_team <- c("Amir", "Brian", "Claire", "Damian")

sample(sales_team, 10, replace = TRUE)
"Claire" "Brian"  "Brian"  "Brian"  "Damian" "Damian" "Brian"  "Brian" 
"Amir"   "Amir"
sample(sales_team, 10, replace = TRUE)
"Amir"   "Amir"   "Claire" "Amir"   "Amir"   "Brian"  "Amir"   "Claire" 
"Claire" "Claire"
Introduction to Statistics in R

Sampling distribution of proportion

Distribution of sample proportions also looks normal

Introduction to Statistics in R

Mean of sampling distribution

# Estimate expected value of die
mean(sample_means)
3.48
# Estimate proportion of "Claire"s
mean(sample_props)
0.26
  • Estimate characteristics of unknown underlying distribution

Sampling distribution of sample means with dashed line down the middle  

  • More easily estimate characteristics of large populations
Introduction to Statistics in R

Let's practice!

Introduction to Statistics in R

Preparing Video For Download...