Distributions by category with seaborn

Importing and Managing Financial Data in Python

Stefan Jansen

Instructor

Distributions by category

  • Last segment: Summary statistics
  • Number of observations, mean per category
  • Now: Visualize distribution of a variable by levels of a categorical variable to facilitate comparison
  • Example: Distribution of Market Cap by Sector or IPO Year
  • More detail than summary stats
Importing and Managing Financial Data in Python

Clean data: removing outliers

nasdaq = pd.read_excel('listings.xlsx', sheet_name='nasdaq', 
                        na_values='n/a')
nasdaq['market_cap_m'] = nasdaq['Market Capitalization'].div(1e6)

nasdaq = nasdaq[nasdaq.market_cap_m > 0] # Active companies only
outliers = nasdaq.market_cap_m.quantile(.9) # Outlier threshold
nasdaq = nasdaq[nasdaq.market_cap_m < outliers] # Remove outliers
Importing and Managing Financial Data in Python

Boxplot: quartiles and outliers

import seaborn as sns
sns.boxplot(x='Sector', y='market_cap_m', data=nasdaq)
plt.xticks(rotation=75);

quartiles.png

Importing and Managing Financial Data in Python

A variation: SwarmPlot

sns.swarmplot(x='Sector', y='market_cap_m', data=nasdaq)
plt.xticks(rotation=75)
plt.show()

swarmplot.png

Importing and Managing Financial Data in Python

Let's practice!

Importing and Managing Financial Data in Python

Preparing Video For Download...