More ways to aggregate your data

Importing and Managing Financial Data in Python

Stefan Jansen

Instructor

Many ways to aggregate

  • Last segment: Group by one variable and aggregate
  • More detailed ways to summarize your data:
    • Group by two or more variables
    • Apply multiple aggregations
  • Examples
    • Median market cap by sector and IPO year
    • Mean & standard deviation of stock price by year
Importing and Managing Financial Data in Python

Several aggregations by category

nasdaq['market_cap_m'] = nasdaq['Market Capitalization'].div(1e6)
by_sector = nasdaq.groupby('Sector')

by_sector.market_cap_m.agg(['size', 'mean']).sort_values('size')
Sector                 size          mean         
Transportation           52   2869.660007
Energy                   66    826.607608
Public Utilities         66   2357.865315
Basic Industries         78    724.899934
...
Consumer Services       348   5582.344175
Technology              433  10883.434214
Finance                 627   1044.090205
Health Care             645   1758.709197
Importing and Managing Financial Data in Python

Several aggregations plus new labels

by_sector.market_cap_m.agg(['size', 'mean'])
      .rename(columns={'size': '#Obs', 'mean': 'Average'})
Sector                #Obs       Average           
Basic Industries        78    724.899934
Capital Goods          172   1511.237373
Consumer Durables       88    839.802607
Consumer Non-Durables  103   3104.051206
Consumer Services      348   5582.344175
...
Health Care            645   1758.709197
Miscellaneous           89   3445.655935
Public Utilities        66   2357.865315
Technology             433  10883.434214
Transportation          52   2869.660007
Importing and Managing Financial Data in Python

Different statistics by column

by_sector.agg({'market_cap_m': 'size', 'IPO Year':'median'})
Sector                 market_cap_m  IPO Year    
Basic Industries                 78    1972.0
Capital Goods                   172    1972.0
Consumer Durables                88    1983.0
Consumer Non-Durables           103    1972.0
Consumer Services               348    1981.0
...
Health Care                     645    1981.0
Miscellaneous                    89    1987.0
Public Utilities                 66    1981.0
Technology                      433    1972.0
Transportation                   52    1986.0
Importing and Managing Financial Data in Python

Aggregate by two categories

by_sector_year = nasdaq.groupby(['Sector', 'IPO Year'])

by_sector_year.market_cap_m.mean()
Sector            IPO Year
Basic Industries  1972.0      877.240005
                  1973.0     1445.697371
                  1986.0     1396.817381
                  ...
Transportation    1986.0     1176.179710
                  1991.0     6646.778622
                  1992.0       56.074572
                  ...
                  2009.0      552.445919
                  2011.0     3711.638317
                  2013.0      125.740421
Importing and Managing Financial Data in Python

Select from MultiIndex

mcap_sector_year = by_sector_year.market_cap_m.mean()
mcap_sect_year.loc['Basic Industries']
IPO Year
1972.0     877.240005
1973.0    1445.697371
1986.0    1396.817381
1988.0      24.847526
...
2012.0     381.796074
2013.0      22.661533
2015.0     260.075564
2016.0      81.288336
Name: market_cap_m, dtype: float64
Importing and Managing Financial Data in Python

Select from MultiIndex

mcap_sect_year.loc[['Basic Industries', 'Transportation']]
Sector            IPO Year
Basic Industries  1972.0      877.240005
                  1973.0     1445.697371
                  1986.0     1396.817381
                  ...
Transportation    1986.0     1176.179710
                  1991.0     6646.778622
                  1992.0       56.074572
                  ...
Importing and Managing Financial Data in Python

Let's practice!

Importing and Managing Financial Data in Python

Preparing Video For Download...