Machine Learning for Time Series Data in Python
Chris Holdgraf
Fellow, Berkeley Institute for Data Science
print(df)
df
0 0.0
1 1.0
2 2.0
3 3.0
4 4.0
# Shift a DataFrame/Series by 3 index values towards the past
print(df.shift(3))
df
0 NaN
1 NaN
2 NaN
3 0.0
4 1.0
# data is a pandas Series containing time series data
data = pd.Series(...)
# Shifts
shifts = [0, 1, 2, 3, 4, 5, 6, 7]
# Create a dictionary of time-shifted data
many_shifts = {'lag_{}'.format(ii): data.shift(ii) for ii in shifts}
# Convert them into a dataframe
many_shifts = pd.DataFrame(many_shifts)
# Fit the model using these input features
model = Ridge()
model.fit(many_shifts, data)
# Visualize the fit model coefficients
fig, ax = plt.subplots()
ax.bar(many_shifts.columns, model.coef_)
ax.set(xlabel='Coefficient name', ylabel='Coefficient value')
# Set formatting so it looks nice
plt.setp(ax.get_xticklabels(), rotation=45, horizontalalignment='right')
Machine Learning for Time Series Data in Python