Introduction to Statistics in R
Maggie Matsui
Content Developer, DataCamp
die <- c(1, 2, 3, 4, 5, 6)
# Roll 5 times sample_of_5 <- sample(die, 5, replace = TRUE) sample_of_5
1 3 4 1 1
mean(sample_of_5)
2.0
# Roll 5 times and take mean
sample(die, 5, replace = TRUE) %>% mean()
4.4
sample(die, 5, replace = TRUE) %>% mean()
3.8
Repeat 10 times:
sample_means <- replicate(10, sample(die, 5, replace = TRUE) %>% mean())
sample_means
3.8 4.0 3.8 3.6 3.2 4.8 2.6 3.0 2.6 2.0
Sampling distribution of the sample mean
replicate(100, sample(die, 5, replace = TRUE) %>% mean())
2.8 3.2 1.8 4.6 4.0 2.8 4.4 2.4 3.4 2.8 4.2 3.4 ... 2.2 3.8 3.6 3.8 4.4 4.8 2.4
sample_means <- replicate(1000, sample(die, 5, replace = TRUE) %>% mean())
The sampling distribution of a statistic becomes closer to the normal distribution as the number of trials increases.
* Samples should be random and independent
replicate(1000, sample(die, 5, replace = TRUE) %>% sd())
sales_team <- c("Amir", "Brian", "Claire", "Damian")
sample(sales_team, 10, replace = TRUE)
"Claire" "Brian" "Brian" "Brian" "Damian" "Damian" "Brian" "Brian"
"Amir" "Amir"
sample(sales_team, 10, replace = TRUE)
"Amir" "Amir" "Claire" "Amir" "Amir" "Brian" "Amir" "Claire"
"Claire" "Claire"
# Estimate expected value of die
mean(sample_means)
3.48
# Estimate proportion of "Claire"s
mean(sample_props)
0.26
Introduction to Statistics in R