Convenience sampling

Sampling in R

Richie Cotton

Data Evangelist at DataCamp

The Literary Digest election prediction

A Literary Digest front page from 1936 showing a headline of election predictions. Landon was expected to get 1.3 million votes and Roosevelt was expected to get just under a 1 million votes.

  • Prediction: Landon gets 57%; Roosevelt gets 43%
  • Actual results: Landon got 38%; Roosevelt got 62%
  • Sample not representative of population, causing sample bias.
  • Collecting data by the easiest method is called convenience sampling.
Sampling in R

Finding the mean age of French people

A photo of Disneyland Paris.

  • Survey 10 people at Disneyland Paris.
  • Their mean age is 24.6 years.
  • Will this be a good estimate for all of France?
1 Image by Sean MacEntee
Sampling in R

How accurate was the survey?

Year Average French Age
1975 31.6
1985 33.6
1995 36.2
2005 38.9
2015 41.2
  • 24.6 years is a poor estimate.
  • People who visit Disneyland aren't representative of the whole population.
Sampling in R

Convenience sampling coffee ratings

coffee_ratings %>% 
  summarize(mean_cup_points = mean(total_cup_points))
  mean_cup_points
1           82.09
coffee_ratings_first10 <- coffee_ratings %>% 
  slice_head(n = 10)
coffee_ratings_first10 %>% 
  summarize(mean_cup_points = mean(total_cup_points))
  mean_cup_points
1            89.1
Sampling in R

Visualizing selection bias

coffee_ratings %>%
  ggplot(aes(x = total_cup_points)) +
  geom_histogram(binwidth = 2)

A histogram of cup points from the population.

coffee_ratings_first10 %>%
  ggplot(aes(x = total_cup_points)) +
  geom_histogram(binwidth = 2) +
  xlim(59, 91)

A histogram of cup points from the sample.

Sampling in R

Visualizing selection bias 2

coffee_ratings %>%
  ggplot(aes(x = total_cup_points)) +
  geom_histogram(binwidth = 2) 

A histogram of cup points from the population.

coffee_ratings %>%
  slice_sample(n = 10) %>% 
  ggplot(aes(x = total_cup_points)) +
  geom_histogram(binwidth = 2) +
  xlim(59, 91)

A histogram of cup points from the a random sample.

Sampling in R

Let's practice!

Sampling in R

Preparing Video For Download...