Convenience sampling

Sampling in Python

James Chapman

Curriculum Manager, DataCamp

The Literary Digest election prediction

A Literary Digest front page from 1936 showing a headline of election predictions. Landon was expected to get 1.3 million votes, and Roosevelt was expected to get just under 1 million votes.

  • Prediction: Landon gets 57%; Roosevelt gets 43%
  • Actual results: Landon got 38%; Roosevelt got 62%
  • Sample not representative of population, causing sample bias
  • Collecting data by the easiest method is called convenience sampling
Sampling in Python

Finding the mean age of French people

A photo of Disneyland Paris.

  • Survey 10 people at Disneyland Paris
  • Mean age of 24.6 years
  • Will this be a good estimate for all of France?
1 Image by Sean MacEntee
Sampling in Python

How accurate was the survey?

Year Average French Age
1975 31.6
1985 33.6
1995 36.2
2005 38.9
2015 41.2
  • 24.6 years is a poor estimate
  • People who visit Disneyland aren't representative of the whole population
Sampling in Python

Convenience sampling coffee ratings

coffee_ratings["total_cup_points"].mean()
82.15120328849028
coffee_ratings_first10 = coffee_ratings.head(10)
coffee_ratings_first10["total_cup_points"].mean()
89.1
Sampling in Python

Visualizing selection bias

import matplotlib.pyplot as plt
import numpy as np
coffee_ratings["total_cup_points"].hist(bins=np.arange(59, 93, 2))
plt.show()

 

coffee_ratings_first10["total_cup_points"].hist(bins=np.arange(59, 93, 2))
plt.show()
Sampling in Python

Distribution of a population and of a convenience sample

Population: A histogram of cup points from the population.

Convenience sample: A histogram of cup points from the sample.

Sampling in Python

Visualizing selection bias for a random sample

coffee_sample = coffee_ratings.sample(n=10)
coffee_sample["total_cup_points"].hist(bins=np.arange(59, 93, 2))
plt.show()
Sampling in Python

Distribution of a population and of a simple random sample

Population: A histogram of cup points from the population.

Random Sample: A histogram of cup points from a random sample.

Sampling in Python

Let's practice!

Sampling in Python

Preparing Video For Download...