Improving the features we use for classification

Python ile Zaman Serisi Verileri için Machine Learning

Chris Holdgraf

Fellow, Berkeley Institute for Data Science

The auditory envelope

  • Smooth the data to calculate the auditory envelope
  • Related to the total amount of audio energy present at each moment of time

Python ile Zaman Serisi Verileri için Machine Learning

Smoothing over time

  • Instead of averaging over all time, we can do a local average
  • This is called smoothing your timeseries
  • It removes short-term noise, while retaining the general pattern
Python ile Zaman Serisi Verileri için Machine Learning

Smoothing your data

Python ile Zaman Serisi Verileri için Machine Learning

Calculating a rolling window statistic

# Audio is a Pandas DataFrame
print(audio.shape)  
# (n_times, n_audio_files)
(5000, 20)  
# Smooth our data by taking the rolling mean in a window of 50 samples
window_size = 50
windowed = audio.rolling(window=window_size)
audio_smooth = windowed.mean()
Python ile Zaman Serisi Verileri için Machine Learning

Calculating the auditory envelope

  • First rectify your audio, then smooth it

      audio_rectified = audio.apply(np.abs)
      audio_envelope = audio_rectified.rolling(50).mean()
    
Python ile Zaman Serisi Verileri için Machine Learning

Python ile Zaman Serisi Verileri için Machine Learning

Python ile Zaman Serisi Verileri için Machine Learning

Python ile Zaman Serisi Verileri için Machine Learning

Feature engineering the envelope

# Calculate several features of the envelope, one per sound
envelope_mean = np.mean(audio_envelope, axis=0)
envelope_std = np.std(audio_envelope, axis=0)
envelope_max = np.max(audio_envelope, axis=0)

# Create our training data for a classifier
X = np.column_stack([envelope_mean, envelope_std, envelope_max])
Python ile Zaman Serisi Verileri için Machine Learning

Preparing our features for scikit-learn

X = np.column_stack([envelope_mean, envelope_std, envelope_max])
y = labels.reshape(-1, 1)
Python ile Zaman Serisi Verileri için Machine Learning

Cross validation for classification

  • cross_val_score automates the process of:
    • Splitting data into training / validation sets
    • Fitting the model on training data
    • Scoring it on validation data
    • Repeating this process
Python ile Zaman Serisi Verileri için Machine Learning

Using cross_val_score

from sklearn.model_selection import cross_val_score

model = LinearSVC()
scores = cross_val_score(model, X, y, cv=3) 
print(scores)
[0.60911642 0.59975305 0.61404035]
Python ile Zaman Serisi Verileri için Machine Learning

Auditory features: The Tempogram

  • We can summarize more complex temporal information with timeseries-specific functions
  • librosa is a great library for auditory and timeseries feature engineering
  • Here we'll calculate the tempogram, which estimates the tempo of a sound over time
  • We can calculate summary statistics of tempo in the same way that we can for the envelope
Python ile Zaman Serisi Verileri için Machine Learning

Computing the tempogram

# Import librosa and calculate the tempo of a 1-D sound array
import librosa as lr
audio_tempo = lr.beat.tempo(y=audio, sr=sfreq, 
                            hop_length=2**6)
Python ile Zaman Serisi Verileri için Machine Learning

Let's practice!

Python ile Zaman Serisi Verileri için Machine Learning

Preparing Video For Download...