Machine Learning for Time Series Data in Python
Chris Holdgraf
Fellow, Berkeley Institute for Data Science
array.shape
(10, 5)
array[:3]
array([[ 0.735528  ,  1.00122818, -0.28315978],
       [-0.94478393,  0.18658748, -0.00241224],
       [-0.74822942, -1.46636618,  0.69835096]]) 
df.head()
       col1      col2      col3
0  0.735528  1.001228 -0.283160
1 -0.944784  0.186587 -0.002412
2 -0.748229 -1.466366  0.698351
3  1.038589 -0.171248  0.831457
4 -0.161904  0.003972 -0.321933
Make sure it looks the way you'd expect.
# Using matplotlib
fig, ax = plt.subplots()
ax.plot(...)
# Using pandas
fig, ax = plt.subplots()
df.plot(..., ax=ax)
Scikit-learn is the most popular machine learning library in Python
from sklearn.svm import LinearSVC
scikit-learn expects a particular structure of data:
(samples, features)Make sure that your data is at least two-dimensional
Make sure the first dimension is samples
array.T.shape
(10, 3)
.reshape(): array.shape
(10,)
array.reshape(-1, 1).shape
(10, 1)
-1 will automatically fill that axis with remaining values# Import a support vector classifier
from sklearn.svm import LinearSVC
# Instantiate this model
model = LinearSVC()
# Fit the model on some data
model.fit(X, y)
It is common for y to be of shape (samples, 1)
# There is one coefficient per input feature
model.coef_
array([[ 0.69417875, -0.5289162 ]])
# Generate predictions
predictions = model.predict(X_test)
Machine Learning for Time Series Data in Python