Classification and feature engineering

Machine Learning for Time Series Data in Python

Chris Holdgraf

Fellow, Berkeley Institute for Data Science

Always visualize raw data before fitting models

Machine Learning for Time Series Data in Python

Visualize your timeseries data!

ixs = np.arange(audio.shape[-1])
time = ixs / sfreq
fig, ax = plt.subplots()
ax.plot(time, audio)

Machine Learning for Time Series Data in Python

What features to use?

  • Using raw timeseries data is too noisy for classification
  • We need to calculate features!
  • An easy start: summarize your audio data
Machine Learning for Time Series Data in Python

Machine Learning for Time Series Data in Python

Calculating multiple features

print(audio.shape)
# (n_files, time)
(20, 7000) 
means = np.mean(audio, axis=-1)
maxs = np.max(audio, axis=-1)
stds = np.std(audio, axis=-1)

print(means.shape)
# (n_files,)
(20,) 
Machine Learning for Time Series Data in Python

Fitting a classifier with scikit-learn

  • We've just collapsed a 2-D dataset (samples x time) into several features of a 1-D dataset (samples)
  • We can combine each feature, and use it as an input to a model
  • If we have a label for each sample, we can use scikit-learn to create and fit a classifier
Machine Learning for Time Series Data in Python

Preparing your features for scikit-learn

# Import a linear classifier
from sklearn.svm import LinearSVC

# Note that means are reshaped to work with scikit-learn
X = np.column_stack([means, maxs, stds])
y = labels.reshape(-1, 1)
model = LinearSVC()
model.fit(X, y)
Machine Learning for Time Series Data in Python

Scoring your scikit-learn model

from sklearn.metrics import accuracy_score

# Different input data
predictions = model.predict(X_test)  

# Score our model with % correct
# Manually
percent_score = sum(predictions == labels_test) / len(labels_test)  
# Using a sklearn scorer
percent_score = accuracy_score(labels_test, predictions)  
Machine Learning for Time Series Data in Python

Let's practice!

Machine Learning for Time Series Data in Python

Preparing Video For Download...