Manipulating Time Series Data in Python
Stefan Jansen
Founder & Lead Data Scientist at Applied Artificial Intelligence
Share Price x Number of Shares => Market ValueS&P 500, NASDAQ, Wilshire 5000, Hang Sengnyse = pd.read_excel('listings.xlsx', sheet_name='nyse', na_values='n/a')nyse.info()
RangeIndex: 3147 entries, 0 to 3146
Data columns (total 7 columns):
Stock Symbol             3147 non-null object   # Stock Ticker
Company Name             3147 non-null object
Last Sale                3079 non-null float64  # Latest Stock Price
Market Capitalization    3147 non-null float64
IPO Year                 1361 non-null float64  # Year of listing
Sector                   2177 non-null object
Industry                 2177 non-null object
dtypes: float64(3), object(4)
  nyse.set_index('Stock Symbol', inplace=True)nyse.dropna(subset=['Sector'], inplace=True)nyse['Market Capitalization'] /= 1e6 # in Million USD
Index: 2177 entries, DDD to ZTO
Data columns (total 6 columns):
Company Name             2177 non-null object
Last Sale                2175 non-null float64
Market Capitalization    2177 non-null float64
IPO Year                 967 non-null float64
Sector                   2177 non-null object
Industry                 2177 non-null object
dtypes: float64(3), object(3)
  components = nyse.groupby(['Sector'])['Market Capitalization'].nlargest(1)components.sort_values(ascending=False)
Sector                 Stock Symbol
Health Care            JNJ             338834.390080
Energy                 XOM             338728.713874
Finance                JPM             300283.250479
Miscellaneous          BABA            275525.000000
Public Utilities       T               247339.517272
Basic Industries       PG              230159.644117
Consumer Services      WMT             221864.614129
Consumer Non-Durables  KO              183655.305119
Technology             ORCL            181046.096000
Capital Goods          TM              155660.252483
Transportation         UPS              90180.886756
Consumer Durables      ABB              48398.935676
Name: Market Capitalization, dtype: float64
  tickers = components.index.get_level_values('Stock Symbol')tickers
Index(['PG', 'TM', 'ABB', 'KO', 'WMT', 'XOM', 'JPM', 'JNJ', 'BABA', 'T',
       'ORCL', ‘UPS'], dtype='object', name='Stock Symbol’)
tickers.tolist()
['PG',
 'TM',
 'ABB',
 'KO',
 'WMT',
...
 'T',
 'ORCL',
 'UPS']
  columns = ['Company Name', 'Market Capitalization', 'Last Sale']component_info = nyse.loc[tickers, columns]pd.options.display.float_format = '{:,.2f}'.format
                                Company Name  Market Capitalization  Last Sale
Stock Symbol
PG            Procter & Gamble Company (The)             230,159.64      90.03
TM                 Toyota Motor Corp Ltd Ord             155,660.25     104.18
ABB                                  ABB Ltd              48,398.94      22.63
KO                   Coca-Cola Company (The)             183,655.31      42.79
WMT                    Wal-Mart Stores, Inc.             221,864.61      73.15
XOM                  Exxon Mobil Corporation             338,728.71      81.69
JPM                    J P Morgan Chase & Co             300,283.25      84.40
JNJ                        Johnson & Johnson             338,834.39     124.99
BABA           Alibaba Group Holding Limited             275,525.00     110.21
T                                  AT&T Inc.             247,339.52      40.28
ORCL                      Oracle Corporation             181,046.10      44.00
UPS              United Parcel Service, Inc.              90,180.89     103.74
  data = pd.read_csv('stocks.csv', parse_dates=['Date'], index_col='Date').loc[:, tickers.tolist()]data.info()
DatetimeIndex: 252 entries, 2016-01-04 to 2016-12-30
Data columns (total 12 columns):
ABB     252 non-null float64
BABA    252 non-null float64
JNJ     252 non-null float64
JPM     252 non-null float64
KO      252 non-null float64
ORCL    252 non-null float64
PG      252 non-null float64
T       252 non-null float64
TM      252 non-null float64
UPS     252 non-null float64
WMT     252 non-null float64
XOM     252 non-null float64
dtypes: float64(12)
  Manipulating Time Series Data in Python