Manipulating Time Series Data in Python
Stefan Jansen
Founder & Lead Data Scientist at Applied Artificial Intelligence
Share Price
x Number of Shares
=> Market Value
S&P 500
, NASDAQ
, Wilshire 5000
, Hang Seng
nyse = pd.read_excel('listings.xlsx', sheet_name='nyse', na_values='n/a')
nyse.info()
RangeIndex: 3147 entries, 0 to 3146
Data columns (total 7 columns):
Stock Symbol 3147 non-null object # Stock Ticker
Company Name 3147 non-null object
Last Sale 3079 non-null float64 # Latest Stock Price
Market Capitalization 3147 non-null float64
IPO Year 1361 non-null float64 # Year of listing
Sector 2177 non-null object
Industry 2177 non-null object
dtypes: float64(3), object(4)
nyse.set_index('Stock Symbol', inplace=True)
nyse.dropna(subset=['Sector'], inplace=True)
nyse['Market Capitalization'] /= 1e6 # in Million USD
Index: 2177 entries, DDD to ZTO
Data columns (total 6 columns):
Company Name 2177 non-null object
Last Sale 2175 non-null float64
Market Capitalization 2177 non-null float64
IPO Year 967 non-null float64
Sector 2177 non-null object
Industry 2177 non-null object
dtypes: float64(3), object(3)
components = nyse.groupby(['Sector'])['Market Capitalization'].nlargest(1)
components.sort_values(ascending=False)
Sector Stock Symbol
Health Care JNJ 338834.390080
Energy XOM 338728.713874
Finance JPM 300283.250479
Miscellaneous BABA 275525.000000
Public Utilities T 247339.517272
Basic Industries PG 230159.644117
Consumer Services WMT 221864.614129
Consumer Non-Durables KO 183655.305119
Technology ORCL 181046.096000
Capital Goods TM 155660.252483
Transportation UPS 90180.886756
Consumer Durables ABB 48398.935676
Name: Market Capitalization, dtype: float64
tickers = components.index.get_level_values('Stock Symbol')
tickers
Index(['PG', 'TM', 'ABB', 'KO', 'WMT', 'XOM', 'JPM', 'JNJ', 'BABA', 'T',
'ORCL', ‘UPS'], dtype='object', name='Stock Symbol’)
tickers.tolist()
['PG',
'TM',
'ABB',
'KO',
'WMT',
...
'T',
'ORCL',
'UPS']
columns = ['Company Name', 'Market Capitalization', 'Last Sale']
component_info = nyse.loc[tickers, columns]
pd.options.display.float_format = '{:,.2f}'.format
Company Name Market Capitalization Last Sale
Stock Symbol
PG Procter & Gamble Company (The) 230,159.64 90.03
TM Toyota Motor Corp Ltd Ord 155,660.25 104.18
ABB ABB Ltd 48,398.94 22.63
KO Coca-Cola Company (The) 183,655.31 42.79
WMT Wal-Mart Stores, Inc. 221,864.61 73.15
XOM Exxon Mobil Corporation 338,728.71 81.69
JPM J P Morgan Chase & Co 300,283.25 84.40
JNJ Johnson & Johnson 338,834.39 124.99
BABA Alibaba Group Holding Limited 275,525.00 110.21
T AT&T Inc. 247,339.52 40.28
ORCL Oracle Corporation 181,046.10 44.00
UPS United Parcel Service, Inc. 90,180.89 103.74
data = pd.read_csv('stocks.csv', parse_dates=['Date'], index_col='Date').loc[:, tickers.tolist()]
data.info()
DatetimeIndex: 252 entries, 2016-01-04 to 2016-12-30
Data columns (total 12 columns):
ABB 252 non-null float64
BABA 252 non-null float64
JNJ 252 non-null float64
JPM 252 non-null float64
KO 252 non-null float64
ORCL 252 non-null float64
PG 252 non-null float64
T 252 non-null float64
TM 252 non-null float64
UPS 252 non-null float64
WMT 252 non-null float64
XOM 252 non-null float64
dtypes: float64(12)
Manipulating Time Series Data in Python