Importing and Managing Financial Data in Python
Stefan Jansen
Instructor
nasdaq.info()
RangeIndex: 3167 entries, 0 to 3166
Data columns (total 7 columns):
# Column Non-Null Count Dtype
-- --- -------------- -----
0 Stock Symbol 3167 non-null object
1 Company Name 3167 non-null object
2 Last Sale 3165 non-null float64
3 Market Capitalization 3167 non-null float64
4 IPO Year 1386 non-null float64
5 Sector 2767 non-null object
6 Industry 2767 non-null object
dtypes: float64(3), object(4)
memory usage: 173.3+ KB
nasdaq['market_cap_m'] = nasdaq['Market Capitalization'].div(1e6)
nasdaq = nasdaq.drop('Market Capitalization', axis=1) # Drop column
nasdaq_by_sector = nasdaq.groupby('Sector') # Create groupby object
for sector, data in nasdaq_by_sector: print(sector, data.market_cap_m.mean())
Basic Industries 724.899933858
Capital Goods 1511.23737278
Consumer Durables 839.802606627
Consumer Non-Durables 3104.05120552
...
Public Utilities 2357.86531507
Technology 10883.4342135
Transportation 2869.66000673
mcap_by_sector = nasdaq_by_sector.market_cap_m.mean()
mcap_by_sector
Sector
Basic Industries 724.899934
Capital Goods 1511.237373
Consumer Durables 839.802607
Consumer Non-Durables 3104.051206
Consumer Services 5582.344175
Energy 826.607608
Finance 1044.090205
Health Care 1758.709197
...
title = 'NASDAQ = Avg. Market Cap by Sector'
mcap_by_sector.plot(kind='barh', title=title)
plt.xlabel('USD mn')
nasdaq_by_sector.mean()
Last Sale IPO Year market_cap_m
Sector
Basic Industries 21.597679 2000.766667 724.899934
Capital Goods 26.188681 2001.324675 1511.237373
Consumer Durables 24.363391 2003.222222 839.802607
Consumer Non-Durables 25.749565 2000.609756 3104.051206
Consumer Services 34.917318 2004.104575 5582.344175
Energy 15.496834 2008.034483 826.607608
Finance 29.644242 2010.321101 1044.090205
Health Care 19.462531 2009.240409 1758.709197
Miscellaneous 46.094369 2004.333333 3445.655935
Public Utilities 18.643705 2006.040000 2357.865315
...
Importing and Managing Financial Data in Python