Percentiles, outliers, and box plots

Statistical Thinking in Python (Part 1)

Justin Bois

Teaching Professor at the California Institute of Technology

Percentiles on an ECDF

ch2-2_v2.003.png

1 Data retrieved from Data.gov (https://www.data.gov/)
Statistical Thinking in Python (Part 1)

Percentiles on an ECDF

ch2-2_v2.004.png

Statistical Thinking in Python (Part 1)

Percentiles on an ECDF

ch2-2_v2.005.png

Statistical Thinking in Python (Part 1)

Percentiles on an ECDF

ch2-2_v2.006.png

Statistical Thinking in Python (Part 1)

Computing percentiles

np.percentile(df_swing['dem_share'], [25, 50, 75])
array([ 37.3025,  43.185 ,  49.925 ])
Statistical Thinking in Python (Part 1)

2008 US election box plot

ch2-2_v2.011.png

1 Data retrieved from Data.gov (https://www.data.gov/)
Statistical Thinking in Python (Part 1)

2008 US election box plot

ch2-2_v2.012.png

Statistical Thinking in Python (Part 1)

2008 US election box plot

ch2-2_v2.013.png

Statistical Thinking in Python (Part 1)

2008 US election box plot

ch2-2_v2.014.png

Statistical Thinking in Python (Part 1)

2008 US election box plot

ch2-2_v2.015.png

Statistical Thinking in Python (Part 1)

2008 US election box plot

ch2-2_v2.016.png

Statistical Thinking in Python (Part 1)

Generating a box plot

import matplotlib.pyplot as plt
import seaborn as sns
_ = sns.boxplot(x='east_west', y='dem_share',
                    data=df_all_states)
_ = plt.xlabel('region')
_ = plt.ylabel('percent of vote for Obama')
plt.show()
Statistical Thinking in Python (Part 1)

Let's practice!

Statistical Thinking in Python (Part 1)

Preparing Video For Download...