Random sampling

Analyzing Survey Data in Python

EbunOluwa Andrew

Data Scientist

Sampling in survey data analysis

  • Sampling = small set from large population
    • Make inferences about larger population
    • Sampling -> manageable form
    • Sampling -> sampling error
    • Minimize error with large population

Photo by Patrick Fore on Unsplash - kid holding candies

Analyzing Survey Data in Python

Random sampling

  • Each member has equal chance of being selected
  • Reduces bias
  • High internal validity
  • High external validity

lottery tickets

Analyzing Survey Data in Python

.sample() method

  • DataFrame.sample(n = _None_, frac = _None_, random_state = _None_)
  • n = number of items to sample
  • frac = proportion (out of 1) of items to return
  • random_state = seed number to produce reproducible results
Analyzing Survey Data in Python

Random sampling example

import pandas as pd
survey = pd.read_csv('ABC_survey.csv')

sample = survey.sample(n=100) print(sample)
|       | employee_id | gender | onsite_work |
|-------|-------------|--------|-------------|
| 3244  | fffe330     | Female | Yes         |
| 21339 | fffe310     | Male   | Yes         |
| 1122  | fffe390     | Male   | Yes         |
| 4363  | fffe313     | Female | Yes         |
Analyzing Survey Data in Python

Random sampling example

import pandas as pd
survey = pd.read_csv('ABC_survey.csv')

sample = survey.sample(frac = 0.1) print(sample)
|     | employee_id | gender | onsite_work |
|-----|-------------|--------|-------------|
| 142 | fffe800     | Female | Yes         |
| 710 | fffe900     | Female | Yes         |
| 242 | fffe700     | Female | Yes         |
| 114 | fffe600     | Female | Yes         |
Analyzing Survey Data in Python

Random sampling example

import pandas as pd
survey = pd.read_csv('ABC_survey.csv')

sample = survey.sample( n = 100, random_state = 123)
import pandas as pd
survey = pd.read_csv('ABC_survey.csv')

sample = survey.sample( frac = 0.1, random_state = 123)
|       | employee_id | gender | onsite_work |
|-------|-------------|--------|-------------|
| 21383 | fffe3       | Female | Yes         |
| 82    | fffe0       | Male   | Yes         |
| 20739 | fffe2       | Male   | Yes         |
| 7662  | fffe9       | Female | Yes         |
|       | employee_id | gender | onsite_work |
|-------|-------------|--------|-------------|
| 21383 | fffe3       | Female | Yes         |
| 82    | fffe0       | Male   | Yes         |
| 20739 | fffe2       | Male   | Yes         |
| 7662  | fffe9       | Female | Yes         |
Analyzing Survey Data in Python

Let's practice!

Analyzing Survey Data in Python

Preparing Video For Download...