Machine Learning for Time Series Data in Python
Chris Holdgraf
Fellow, Berkeley Institute for Data Science
array.shape
(10, 5)
array[:3]
array([[ 0.735528 , 1.00122818, -0.28315978],
[-0.94478393, 0.18658748, -0.00241224],
[-0.74822942, -1.46636618, 0.69835096]])
df.head()
col1 col2 col3
0 0.735528 1.001228 -0.283160
1 -0.944784 0.186587 -0.002412
2 -0.748229 -1.466366 0.698351
3 1.038589 -0.171248 0.831457
4 -0.161904 0.003972 -0.321933
Make sure it looks the way you'd expect.
# Using matplotlib
fig, ax = plt.subplots()
ax.plot(...)
# Using pandas
fig, ax = plt.subplots()
df.plot(..., ax=ax)
Scikit-learn is the most popular machine learning library in Python
from sklearn.svm import LinearSVC
scikit-learn
expects a particular structure of data:
(samples, features)
Make sure that your data is at least two-dimensional
Make sure the first dimension is samples
array.T.shape
(10, 3)
.reshape()
: array.shape
(10,)
array.reshape(-1, 1).shape
(10, 1)
-1
will automatically fill that axis with remaining values# Import a support vector classifier
from sklearn.svm import LinearSVC
# Instantiate this model
model = LinearSVC()
# Fit the model on some data
model.fit(X, y)
It is common for y
to be of shape (samples, 1)
# There is one coefficient per input feature
model.coef_
array([[ 0.69417875, -0.5289162 ]])
# Generate predictions
predictions = model.predict(X_test)
Machine Learning for Time Series Data in Python