Aggregate your data by category

Importing and Managing Financial Data in Python

Stefan Jansen

Instructor

Summarize numeric data by category

  • So far: Summarize individual variables
  • Compute descriptive statistic like mean, quantiles
  • Split data into groups, then summarize groups
  • Examples:
    • Largest company by exchange
    • Median market capitalization per IPO year
    • Average market capitalization per sector
Importing and Managing Financial Data in Python

Group your data by sector

nasdaq.info()
RangeIndex: 3167 entries, 0 to 3166
Data columns (total 7 columns):
#   Column                 Non-Null Count  Dtype 
-- ---                  --------------  -----  
 0   Stock Symbol           3167 non-null   object 
 1   Company Name           3167 non-null   object 
 2   Last Sale              3165 non-null   float64
 3   Market Capitalization  3167 non-null   float64
 4   IPO Year               1386 non-null   float64
 5   Sector                 2767 non-null   object 
 6   Industry               2767 non-null   object 
dtypes: float64(3), object(4)
memory usage: 173.3+ KB
Importing and Managing Financial Data in Python

Group your data by sector

nasdaq['market_cap_m'] = nasdaq['Market Capitalization'].div(1e6)

nasdaq = nasdaq.drop('Market Capitalization', axis=1) # Drop column
nasdaq_by_sector = nasdaq.groupby('Sector') # Create groupby object
for sector, data in nasdaq_by_sector: print(sector, data.market_cap_m.mean())
Basic Industries 724.899933858
Capital Goods 1511.23737278
Consumer Durables 839.802606627
Consumer Non-Durables 3104.05120552
...
Public Utilities 2357.86531507
Technology 10883.4342135
Transportation 2869.66000673
Importing and Managing Financial Data in Python

Keep it simple and skip the loop

mcap_by_sector = nasdaq_by_sector.market_cap_m.mean()
mcap_by_sector
Sector
Basic Industries           724.899934
Capital Goods             1511.237373
Consumer Durables          839.802607
Consumer Non-Durables     3104.051206
Consumer Services         5582.344175
Energy                     826.607608
Finance                   1044.090205
Health Care               1758.709197
...
Importing and Managing Financial Data in Python

Visualize category summaries

title = 'NASDAQ = Avg. Market Cap by Sector'
mcap_by_sector.plot(kind='barh', title=title)
plt.xlabel('USD mn')

nasdaq_by_sector.png

Importing and Managing Financial Data in Python

Aggregate summary for all numeric columns

nasdaq_by_sector.mean()
                       Last Sale     IPO Year  market_cap_m
Sector                                                     
Basic Industries       21.597679  2000.766667    724.899934
Capital Goods          26.188681  2001.324675   1511.237373
Consumer Durables      24.363391  2003.222222    839.802607
Consumer Non-Durables  25.749565  2000.609756   3104.051206
Consumer Services      34.917318  2004.104575   5582.344175
Energy                 15.496834  2008.034483    826.607608
Finance                29.644242  2010.321101   1044.090205
Health Care            19.462531  2009.240409   1758.709197
Miscellaneous          46.094369  2004.333333   3445.655935
Public Utilities       18.643705  2006.040000   2357.865315
...
Importing and Managing Financial Data in Python

Let's practice!

Importing and Managing Financial Data in Python

Preparing Video For Download...