Machine learning basics

Machine Learning for Time Series Data in Python

Chris Holdgraf

Fellow, Berkeley Institute for Data Science

Always begin by looking at your data

array.shape

(10, 5)

array[:3]

array([[ 0.735528  ,  1.00122818, -0.28315978],
       [-0.94478393,  0.18658748, -0.00241224],
       [-0.74822942, -1.46636618,  0.69835096]])

Always begin by looking at your data

df.head()

       col1      col2      col3
0  0.735528  1.001228 -0.283160
1 -0.944784  0.186587 -0.002412
2 -0.748229 -1.466366  0.698351
3  1.038589 -0.171248  0.831457
4 -0.161904  0.003972 -0.321933

Always visualize your data

Make sure it looks the way you'd expect.

# Using matplotlib
fig, ax = plt.subplots()
ax.plot(...)

# Using pandas
fig, ax = plt.subplots()
df.plot(..., ax=ax)

Scikit-learn

Scikit-learn is the most popular machine learning library in Python

from sklearn.svm import LinearSVC

Preparing data for scikit-learn

scikit-learn expects a particular structure of data:

(samples, features)
Make sure that your data is at least two-dimensional
Make sure the first dimension is samples

If your data is not shaped properly

If the axes are swapped:

array.T.shape

(10, 3)

If your data is not shaped properly

If we're missing an axis, use .reshape():

array.shape

(10,)

array.reshape(-1, 1).shape

(10, 1)

-1 will automatically fill that axis with remaining values

Fitting a model with scikit-learn

# Import a support vector classifier
from sklearn.svm import LinearSVC

# Instantiate this model
model = LinearSVC()

# Fit the model on some data
model.fit(X, y)

It is common for y to be of shape (samples, 1)

Investigating the model

# There is one coefficient per input feature
model.coef_

array([[ 0.69417875, -0.5289162 ]])

Predicting with a fit model

# Generate predictions
predictions = model.predict(X_test)

Let's practice

Machine Learning for Time Series Data in Python

Machine learning basics

Always begin by looking at your data

Always begin by looking at your data

Always visualize your data

Scikit-learn

Preparing data for scikit-learn

(samples, features)

If your data is not shaped properly

If your data is not shaped properly

Fitting a model with scikit-learn

Investigating the model

Predicting with a fit model

Let's practice

`(samples, features)`