Analyzing Survey Data in Python
EbunOluwa Andrew
Data Scientist
.describe()
data.describe()
| | year | satisfaction_rating
|------|----------|--------------------
| count| 42 | 42
| mean | 2012.381 | 7192.857
| std | 4.196 | 945.178
| min | 2006 | 5500
| 25% | 2009 | 6325
| 50% | 2012.5 | 7400
| 75% | 2016 | 8000
| max | 2019 | 8600
data.describe(include = np.object)
| | category |
|--------|-------------|
| count | 42 |
| unique | 3 |
| top | Residential |
| freq | 14 |
| | year | satisfaction_rating |
|-------|----------|---------------------|
| count | 42 | 42 |
| mean | 2012.381 | 7192.857 |
| std | 4.196 | 945.178 |
| min | 2006 | 5500 |
| 25% | 2009 | 6325 |
| 50% | 2012.5 | 7400 |
| 75% | 2016 | 8000 |
| max | 2019 | 8600 |
| | category |
|--------|-------------|
| count | 42 |
| unique | 3 |
| top | Residential |
| freq | 14 |
import pandas as pd
electric_satisfaction = pd.read_csv("austin-energy-customer-satisfaction.csv")
electric_satisfaction.describe()
| | year | satisfaction_rating
|------|----------|--------------------
| count| 42 | 42
| mean | 2012.381 | 7192.857
| std | 4.196 | 945.178
| min | 2006 | 5500
| 25% | 2009 | 6325
| 50% | 2012.5 | 7400
| 75% | 2016 | 8000
| max | 2019 | 8600
satisfaction_rating
has outliers| | category |
|--------|-------------|
| count | 42 |
| unique | 3 |
| top | Residential |
| freq | 14 |
norm.interval()
functionimport scipy.stats
scipy.stats.norm.interval(alpha, loc, scale)
alpha
= confidence levelloc
= sample meanscale
= sample std errorelectric_satisfaction = pd.read_csv( "austin-energy-customer-satisfaction.csv") conf_interval = st.norm.interval( alpha = 0.99, loc = np.mean(electric_satisfaction.satisfaction), scale=st.sem(electric_satisfaction.satisfaction))
print(conf_interval)
(6817.187361704269, 7568.526924010017)
Analyzing Survey Data in Python