Stratified Random Sampling

Analyzing Survey Data in Python

EbunOluwa Andrew

Data Scientist

What is stratified random sampling

  • Stratified sampling better reflects population
  • Technique in which population is divided into discrete units (strata) based on similar attributes
  • Involves re-sampling the sample data so proportions match the population

Demographic change as a large group of people as a changing diversity in a population

Analyzing Survey Data in Python

Why use stratified random sampling?

  • Minimizes selection bias
  • Increases certain population group's representativeness
  • Examples:
    • Estimating income for varying populations
    • Estimating polling elections
    • Estimating life expectancy

Hand held tally counter counting headcount of people

Analyzing Survey Data in Python

When not to use stratified random sampling

  • Subgroups should not overlap
    • Subjects that fall into multiple groups -> misrepresentation
  • Example of overlap in survey question
    • How long have you worked at your current job?
      • 1-2 years
      • 2-4 years

thinking lady

Analyzing Survey Data in Python

Onsite work survey results at firm ABC

| employee_id | gender | onsite_work |
|-------------|--------|-------------|
|    fffe6838 | Male   | Yes         |
|   fffe12184 | Female | Yes         |
|    fffe9404 | Female | Yes         |
|   fffe17578 | Male   | Yes         |
|   fffe22257 | Female | Yes         |
|    fffe6217 | Male   | Yes         |
|    fffe7828 | Female | Yes         |
|   fffe18192 | Male   | Yes         |
|    fffe2839 | Female | Yes         |
|   fffe16173 | Female | Yes         |

Analyzing Survey Data in Python

Check proportions on population

survey.gender.value_counts(normalize=True)
Female    0.556
Male      0.444
Name: gender, dtype: float64
Analyzing Survey Data in Python

Plotting proportions on population

import pandas as pd
import matplotlib.pyplot as plt

survey.gender.value_counts().plot.pie()

Survey female to male ratio

Analyzing Survey Data in Python

Stratified sampling example

strat_sample = 
survey.groupby(
  'gender', group_keys = False).apply(
  lambda x: x.sample(frac = 0.1))
| employee_id | gender | onsite_work |
|-------------|--------|-------------|
|    fffe4934 | Female | Yes         |
|    fffe3958 | Female | Yes         |
|      fffe18 | Female | Yes         |
|     fffe283 | Female | Yes         |
|   fffe20382 | Female | Yes         |
|    fffe8721 | Male   | Yes         |
|    fffe5955 | Male   | Yes         |
|    fffe7081 | Male   | Yes         |
|     fffe353 | Male   | Yes         |
|    fffe1765 | Male   | Yes         |

Analyzing Survey Data in Python

Check proportions on sample

Original population

survey.gender.value_counts(normalize=True)

Stratified sample

strat_sample.gender.value_counts(
  normalize=True))
Female    0.556
Male      0.444
Name: gender, dtype: float64
Female    0.56
Male      0.44
Name: gender, dtype: float64
Analyzing Survey Data in Python

Let's practice!

Analyzing Survey Data in Python

Preparing Video For Download...