Manipulating Time Series Data in Python
Stefan Jansen
Founder & Lead Data Scientist at Applied Artificial Intelligence
Typical Time Series manipulations include:
Shift or lag values back or forward back in time
Get the difference in value for a given time period
Compute the percent change over any number of periods
pandas
built-in methods rely on pd.DateTimeIndex
pd.read_csv()
do the parsing for you!google = pd.read_csv('google.csv', parse_dates=['date'], index_col='date')
google.info()
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 504 entries, 2015-01-02 to 2016-12-30
Data columns (total 1 columns):
price 504 non-null float64
dtypes: float64(1)
google.head()
price
date
2015-01-02 524.81
2015-01-05 513.87
2015-01-06 501.96
2015-01-07 501.10
2015-01-08 502.68
.shift()
:periods=1
google['shifted'] = google.price.shift() # default: periods=1
google.head(3)
price shifted
date
2015-01-02 542.81 NaN
2015-01-05 513.87 542.81
2015-01-06 501.96 513.87
.shift(periods=-1)
:google['lagged'] = google.price.shift(periods=-1)
google[['price', 'lagged', 'shifted']].tail(3)
price lagged shifted
date
2016-12-28 785.05 782.79 791.55
2016-12-29 782.79 771.82 785.05
2016-12-30 771.82 NaN 782.79
google['change'] = google.price.div(google.shifted)
google[['price', 'shifted', 'change']].head(3)
price shifted change
Date
2017-01-03 786.14 NaN NaN
2017-01-04 786.90 786.14 1.000967
2017-01-05 794.02 786.90 1.009048
google['return'] = google.change.sub(1).mul(100)
google[['price', 'shifted', 'change', 'return']].head(3)
price shifted change return
date
2015-01-02 524.81 NaN NaN NaN
2015-01-05 513.87 524.81 0.98 -2.08
2015-01-06 501.96 513.87 0.98 -2.32
Difference in value for two adjacent periods
$x_t - x_{t-1}$
google['diff'] = google.price.diff()
google[['price', 'diff']].head(3)
price diff
date
2015-01-02 524.81 NaN
2015-01-05 513.87 -10.94
2015-01-06 501.96 -11.91
Percent change for two adjacent periods
$\frac{x_t}{x_{t-1}}$
google['pct_change'] = google.price.pct_change().mul(100)
google[['price', 'return', 'pct_change']].head(3)
price return pct_change
date
2015-01-02 524.81 NaN NaN
2015-01-05 513.87 -2.08 -2.08
2015-01-06 501.96 -2.32 -2.32
google['return_3d'] = google.price.pct_change(periods=3).mul(100)
google[['price', 'return_3d']].head()
price return_3d
date
2015-01-02 524.81 NaN
2015-01-05 513.87 NaN
2015-01-06 501.96 NaN
2015-01-07 501.10 -4.517825
2015-01-08 502.68 -2.177594
Manipulating Time Series Data in Python