Describe the distribution of your data with quantiles

Importing and Managing Financial Data in Python

Stefan Jansen

Instructor

Describe data distributions

  • First glance: Central tendency and standard deviation
  • How to get a more granular view of the distribution?
  • Calculate and plot quantiles
Importing and Managing Financial Data in Python

More on dispersion: quantiles

  • Quantiles: Groups with equal share of observations
    • Quartiles: 4 groups, 25% of data each
    • Deciles: 10 groups, 10% of data each
    • Interquartile range: 3rd quartile - 1st quartile

quantile.png

Importing and Managing Financial Data in Python

Quantiles with pandas

market_cap = nasdaq['Market Capitalization'].div(10**6) 
median = market_cap.quantile(.5)

median == market_cap.median()
True
quantiles = market_cap.quantile([.25, .75])
0.25     43.375930
0.75    969.905207
quantiles[.75] - quantiles[.25] # Interquartile Range
926.5292771575
Importing and Managing Financial Data in Python

Quantiles with pandas & numpy

deciles = np.arange(start=.1, stop=.91, step=.1)
deciles
array([ 0.1,  0.2,  0.3,  0.4,  ..., 0.7,  0.8,  0.9])
market_cap.quantile(deciles)
0.1       4.884565
0.2      26.993382
0.3      65.714547
0.4     124.320644
0.5     225.968428
0.6     402.469678
...
Importing and Managing Financial Data in Python

Visualize quantiles with bar chart

title = 'NASDAQ Market Capitalization (million USD)'
market_cap.quantile(deciles).plot(kind='bar', title=title)
plt.tight_layout(); plt.show()

quantile_bar.png

Importing and Managing Financial Data in Python

All statistics in one go

market_cap.describe()
count      3167.000000
mean       3180.712621
std       25471.038707
min           0.000000
25%          43.375930  # 1st quantile
50%         225.968428  # Median
75%         969.905207  # 3rd quantile
max      740024.467000
Name: Market Capitalization
Importing and Managing Financial Data in Python

All statistics in one go

market_cap.describe(percentiles=np.arange(.1, .91, .1))
count      3167.000000
mean       3180.712621
std       25471.038707
min           0.000000
10%           4.884565
20%          26.993382
30%          65.714547
40%         124.320644
50%         225.968428
60%         402.469678
70%         723.163197
80%        1441.071134
...
Importing and Managing Financial Data in Python

Let's practice!

Importing and Managing Financial Data in Python

Preparing Video For Download...