Manipulating Time Series Data in Python
Stefan Jansen
Founder & Lead Data Scientist at Applied Artificial Intelligence
.expanding()
- just like .rolling()
.cumsum()
, .cumprod()
, cummin()
/max()
df = pd.DataFrame({'data': range(5)})
df['expanding sum'] = df.data.expanding().sum()
df['cumulative sum'] = df.data.cumsum()
df
data expanding sum cumulative sum
0 0 0.0 0
1 1 1.0 1
2 2 3.0 3
3 3 6.0 6
4 4 10.0 10
data = pd.read_csv('sp500.csv', parse_dates=['date'], index_col='date')
DatetimeIndex: 2519 entries, 2007-05-24 to 2017-05-24
Data columns (total 1 columns):
SP500 2519 non-null float64
Single period return $r_t$: current price over last price minus 1:
$$r_t = \frac{P_t}{P_{t-1}} - 1$$
$$R_T = (1 + r_1)(1 + r_2)...(1 + r_T) - 1$$
.pct_change()
.add()
, .sub()
, .mul()
, .div()
.cumprod()
pr = data.SP500.pct_change() # period return
pr_plus_one = pr.add(1)
cumulative_return = pr_plus_one.cumprod().sub(1)
cumulative_return.mul(100).plot()
data['running_min'] = data.SP500.expanding().min()
data['running_max'] = data.SP500.expanding().max()
data.plot()
def multi_period_return(period_returns): return np.prod(period_returns + 1) - 1
pr = data.SP500.pct_change() # period return
r = pr.rolling('360D').apply(multi_period_return)
data['Rolling 1yr Return'] = r.mul(100)
data.plot(subplots=True)
data['Rolling 1yr Return'] = r.mul(100)
data.plot(subplots=True)
Manipulating Time Series Data in Python