Analyzing Survey Data in Python
EbunOluwa Andrew
Data Scientist
Input variables relevant to output variable
Understand impact of different variables on population
Check if differences are by chance or statistically significant
pet_type | current_pets | time_spent | reduces_stress |
---|---|---|---|
dog | 1 | 420 | yes |
dog | 1 | 180 | yes |
dog | 4 | 30 | yes |
dog | 1 | 30 | yes |
dog | 1 | 60 | yes |
pet_type
reduces_stress
import pandas as pd import scipy.stats as st data = pd.read_csv('pet_survey.csv')
cross_table = pd.crosstab(data.reduces_stress, data.pet_type)
chi_analysis = st.chi2_contingency(cross_table)
print(chi_analysis)
|--------------------------|
| (67.7, |
| 1.9e-16, |
| 1, |
| array([[1767.0, 1825.0], |
| [2251.0, 2325.0]])) |
Frequencies >= 5
p-value < 0.05
pet_owned
and reduces_stress
are relatedType of pet owned has an effect on whether pet owners perceive stress reduction
Analyzing Survey Data in Python