Summary statistics by category with seaborn

Importing and Managing Financial Data in Python

Stefan Jansen

Instructor

Categorical plots with seaborn

  • Specialized ways to plot combinations of categorical and numerical variables
  • Visualize estimates of summary statistics per category
  • Understand how categories impact numerical variables
  • Compare using key metrics of distributional characteristics
  • Example: Mean Market Cap per Sector or IPO Year with indication of dispersion
Importing and Managing Financial Data in Python

The basics: countplot

sns.countplot(x='Sector', data=nasdaq)
plt.xticks(rotation=45)

countplot.png

Importing and Managing Financial Data in Python

countplot, sorted

sector_size = nasdaq.groupby('Sector').size()
order = sector_size.sort_values(ascending=False)
order.head()
Sector
Health Care          645
Finance              627
Technology           433
...
order = order.index.tolist()
['Health Care', 'Finance', ..., 'Energy', 'Transportation']
Importing and Managing Financial Data in Python

countplot, sorted

sns.countplot(x='Sector', data=nasdaq, order=order)
plt.xticks(rotation=45)
plt.title('# Observations per Sector’)

countplot_sorted.png

Importing and Managing Financial Data in Python

countplot, multiple categories

recent_ipos = nasdaq[nasdaq['IPO Year'] > 2014]
recent_ipos['IPO Year'] = recent_ipos['IPO Year'].astype(int)
sns.countplot(x='Sector', hue='IPO Year', data=recent_ipos)

countplot_mc.png

Importing and Managing Financial Data in Python

Compare stats with PointPlot

nasdaq['IPO'] = nasdaq['IPO Year'].apply(lambda x: 'After 2000' if x > 2000 else 'Before 2000')
sns.pointplot(x='Sector', y='market_cap_m', hue='IPO', data=nasdaq)
plt.xticks(rotation=45); plt.title('Mean Market Cap')

pointplot.png

Importing and Managing Financial Data in Python

Let's practice!

Importing and Managing Financial Data in Python

Preparing Video For Download...