Select index components & import data

Manipulating Time Series Data in Python

Stefan Jansen

Founder & Lead Data Scientist at Applied Artificial Intelligence

Market value-weighted index

  • Composite performance of various stocks
  • Components weighted by market capitalization
    • Share Price x Number of Shares => Market Value
  • Larger components get higher percentage weightings
  • Key market indexes are value-weighted:
    • S&P 500, NASDAQ, Wilshire 5000, Hang Seng
Manipulating Time Series Data in Python

Build a cap-weighted Index

  • Apply new skills to construct value-weighted index
    • Select components from exchange listing data
    • Get component number of shares and stock prices
    • Calculate component weights
    • Calculate index
    • Evaluate performance of components and index
Manipulating Time Series Data in Python

Load stock listing data

nyse = pd.read_excel('listings.xlsx', sheet_name='nyse',
                     na_values='n/a')

nyse.info()
RangeIndex: 3147 entries, 0 to 3146
Data columns (total 7 columns):
Stock Symbol             3147 non-null object   # Stock Ticker
Company Name             3147 non-null object
Last Sale                3079 non-null float64  # Latest Stock Price
Market Capitalization    3147 non-null float64
IPO Year                 1361 non-null float64  # Year of listing
Sector                   2177 non-null object
Industry                 2177 non-null object
dtypes: float64(3), object(4)
Manipulating Time Series Data in Python

Load & prepare listing data

nyse.set_index('Stock Symbol', inplace=True)

nyse.dropna(subset=['Sector'], inplace=True)
nyse['Market Capitalization'] /= 1e6 # in Million USD
Index: 2177 entries, DDD to ZTO
Data columns (total 6 columns):
Company Name             2177 non-null object
Last Sale                2175 non-null float64
Market Capitalization    2177 non-null float64
IPO Year                 967 non-null float64
Sector                   2177 non-null object
Industry                 2177 non-null object
dtypes: float64(3), object(3)
Manipulating Time Series Data in Python

Select index components

components = nyse.groupby(['Sector'])['Market Capitalization'].nlargest(1)

components.sort_values(ascending=False)
Sector                 Stock Symbol
Health Care            JNJ             338834.390080
Energy                 XOM             338728.713874
Finance                JPM             300283.250479
Miscellaneous          BABA            275525.000000
Public Utilities       T               247339.517272
Basic Industries       PG              230159.644117
Consumer Services      WMT             221864.614129
Consumer Non-Durables  KO              183655.305119
Technology             ORCL            181046.096000
Capital Goods          TM              155660.252483
Transportation         UPS              90180.886756
Consumer Durables      ABB              48398.935676
Name: Market Capitalization, dtype: float64
Manipulating Time Series Data in Python

Import & prepare listing data

tickers = components.index.get_level_values('Stock Symbol')

tickers
Index(['PG', 'TM', 'ABB', 'KO', 'WMT', 'XOM', 'JPM', 'JNJ', 'BABA', 'T',
       'ORCL', ‘UPS'], dtype='object', name='Stock Symbol’)
tickers.tolist()
['PG',
 'TM',
 'ABB',
 'KO',
 'WMT',
...
 'T',
 'ORCL',
 'UPS']
Manipulating Time Series Data in Python

Stock index components

columns = ['Company Name', 'Market Capitalization', 'Last Sale']

component_info = nyse.loc[tickers, columns]
pd.options.display.float_format = '{:,.2f}'.format
                                Company Name  Market Capitalization  Last Sale
Stock Symbol
PG            Procter & Gamble Company (The)             230,159.64      90.03
TM                 Toyota Motor Corp Ltd Ord             155,660.25     104.18
ABB                                  ABB Ltd              48,398.94      22.63
KO                   Coca-Cola Company (The)             183,655.31      42.79
WMT                    Wal-Mart Stores, Inc.             221,864.61      73.15
XOM                  Exxon Mobil Corporation             338,728.71      81.69
JPM                    J P Morgan Chase & Co             300,283.25      84.40
JNJ                        Johnson & Johnson             338,834.39     124.99
BABA           Alibaba Group Holding Limited             275,525.00     110.21
T                                  AT&T Inc.             247,339.52      40.28
ORCL                      Oracle Corporation             181,046.10      44.00
UPS              United Parcel Service, Inc.              90,180.89     103.74
Manipulating Time Series Data in Python

Import & prepare listing data

data = pd.read_csv('stocks.csv', parse_dates=['Date'],
                             index_col='Date').loc[:, tickers.tolist()]

data.info()
DatetimeIndex: 252 entries, 2016-01-04 to 2016-12-30
Data columns (total 12 columns):
ABB     252 non-null float64
BABA    252 non-null float64
JNJ     252 non-null float64
JPM     252 non-null float64
KO      252 non-null float64
ORCL    252 non-null float64
PG      252 non-null float64
T       252 non-null float64
TM      252 non-null float64
UPS     252 non-null float64
WMT     252 non-null float64
XOM     252 non-null float64
dtypes: float64(12)
Manipulating Time Series Data in Python

Let's practice!

Manipulating Time Series Data in Python

Preparing Video For Download...