Manipulating Time Series Data in Python
Stefan Jansen
Founder & Lead Data Scientist at Applied Artificial Intelligence
Stock price series: hard to compare at different levels
Simple solution: normalize price series to start at 100
Divide all prices by first in series, multiply by 100
Same starting point
All prices relative to starting point
Difference to starting point in percentage points
google = pd.read_csv('google.csv', parse_dates=['date'], index_col='date')
google.head(3)
price
date
2010-01-04 313.06
2010-01-05 311.68
2010-01-06 303.83
first_price = google.price.iloc[0] # int-based selection
first_price
313.06
first_price == google.loc['2010-01-04', 'price']
True
normalized = google.price.div(first_price).mul(100)
normalized.plot(title='Google Normalized Series')
prices = pd.read_csv('stock_prices.csv', parse_dates=['date'], index_col='date')
prices.info()
DatetimeIndex: 1761 entries, 2010-01-04 to 2016-12-30
Data columns (total 3 columns):
AAPL 1761 non-null float64
GOOG 1761 non-null float64
YHOO 1761 non-null float64
dtypes: float64(3)
prices.head(2)
AAPL GOOG YHOO
Date
2010-01-04 30.57 313.06 17.10
2010-01-05 30.63 311.68 17.23
prices.iloc[0]
AAPL 30.57
GOOG 313.06
YHOO 17.10
Name: 2010-01-04 00:00:00, dtype: float64
normalized = prices.div(prices.iloc[0])
normalized.head(3)
AAPL GOOG YHOO
Date
2010-01-04 1.000000 1.000000 1.000000
2010-01-05 1.001963 0.995592 1.007602
2010-01-06 0.985934 0.970517 1.004094
.div()
: automatic alignment of Series index & DataFrame columnsindex = pd.read_csv('benchmark.csv', parse_dates=['date'], index_col='date')
index.info()
DatetimeIndex: 1826 entries, 2010-01-01 to 2016-12-30
Data columns (total 1 columns):
SP500 1762 non-null float64
dtypes: float64(1)
prices = pd.concat([prices, index], axis=1).dropna()
prices.info()
DatetimeIndex: 1761 entries, 2010-01-04 to 2016-12-30
Data columns (total 4 columns):
AAPL 1761 non-null float64
GOOG 1761 non-null float64
YHOO 1761 non-null float64
SP500 1761 non-null float64
dtypes: float64(4)
prices.head(1)
AAPL GOOG YHOO SP500
2010-01-04 30.57 313.06 17.10 1132.99
normalized = prices.div(prices.iloc[0]).mul(100)
normalized.plot()
diff = normalized[tickers].sub(normalized['SP500'], axis=0)
GOOG YHOO AAPL
2010-01-04 0.000000 0.000000 0.000000
2010-01-05 -0.752375 0.448669 -0.115294
2010-01-06 -3.314604 0.043069 -1.772895
.sub(..., axis=0)
: Subtract a Series from each DataFrame column by aligning indexesdiff.plot()
Manipulating Time Series Data in Python