Machine learning basics

Machine Learning for Time Series Data in Python

Chris Holdgraf

Fellow, Berkeley Institute for Data Science

Always begin by looking at your data

array.shape
(10, 5)
array[:3]
array([[ 0.735528  ,  1.00122818, -0.28315978],
       [-0.94478393,  0.18658748, -0.00241224],
       [-0.74822942, -1.46636618,  0.69835096]]) 
Machine Learning for Time Series Data in Python

Always begin by looking at your data

df.head()
       col1      col2      col3
0  0.735528  1.001228 -0.283160
1 -0.944784  0.186587 -0.002412
2 -0.748229 -1.466366  0.698351
3  1.038589 -0.171248  0.831457
4 -0.161904  0.003972 -0.321933
Machine Learning for Time Series Data in Python

Always visualize your data

Make sure it looks the way you'd expect.

# Using matplotlib
fig, ax = plt.subplots()
ax.plot(...)

# Using pandas
fig, ax = plt.subplots()
df.plot(..., ax=ax)
Machine Learning for Time Series Data in Python

Scikit-learn

Scikit-learn is the most popular machine learning library in Python

from sklearn.svm import LinearSVC
Machine Learning for Time Series Data in Python

Preparing data for scikit-learn

  • scikit-learn expects a particular structure of data:

    (samples, features)

  • Make sure that your data is at least two-dimensional

  • Make sure the first dimension is samples

Machine Learning for Time Series Data in Python

If your data is not shaped properly

  • If the axes are swapped:
array.T.shape
(10, 3)
Machine Learning for Time Series Data in Python

If your data is not shaped properly

  • If we're missing an axis, use .reshape():
array.shape
(10,)
array.reshape(-1, 1).shape
(10, 1)
  • -1 will automatically fill that axis with remaining values
Machine Learning for Time Series Data in Python

Fitting a model with scikit-learn

# Import a support vector classifier
from sklearn.svm import LinearSVC

# Instantiate this model
model = LinearSVC()

# Fit the model on some data
model.fit(X, y)

It is common for y to be of shape (samples, 1)

Machine Learning for Time Series Data in Python

Investigating the model

# There is one coefficient per input feature
model.coef_
array([[ 0.69417875, -0.5289162 ]])
Machine Learning for Time Series Data in Python

Predicting with a fit model

# Generate predictions
predictions = model.predict(X_test)
Machine Learning for Time Series Data in Python

Let's practice

Machine Learning for Time Series Data in Python

Preparing Video For Download...