Analyzing Survey Data in Python
EbunOluwa Andrew
Data Scientist
| employee_id | gender | onsite_work |
|-------------|--------|-------------|
| fffe6838 | Male | Yes |
| fffe12184 | Female | Yes |
| fffe9404 | Female | Yes |
| fffe17578 | Male | Yes |
| fffe22257 | Female | Yes |
| fffe6217 | Male | Yes |
| fffe7828 | Female | Yes |
| fffe18192 | Male | Yes |
| fffe2839 | Female | Yes |
| fffe16173 | Female | Yes |
survey.gender.value_counts(normalize=True)
Female 0.556
Male 0.444
Name: gender, dtype: float64
import pandas as pd
import matplotlib.pyplot as plt
survey.gender.value_counts().plot.pie()
strat_sample =
survey.groupby(
'gender', group_keys = False).apply(
lambda x: x.sample(frac = 0.1))
| employee_id | gender | onsite_work |
|-------------|--------|-------------|
| fffe4934 | Female | Yes |
| fffe3958 | Female | Yes |
| fffe18 | Female | Yes |
| fffe283 | Female | Yes |
| fffe20382 | Female | Yes |
| fffe8721 | Male | Yes |
| fffe5955 | Male | Yes |
| fffe7081 | Male | Yes |
| fffe353 | Male | Yes |
| fffe1765 | Male | Yes |
survey.gender.value_counts(normalize=True)
strat_sample.gender.value_counts(
normalize=True))
Female 0.556
Male 0.444
Name: gender, dtype: float64
Female 0.56
Male 0.44
Name: gender, dtype: float64
Analyzing Survey Data in Python