Stationarity and stability

Python ile Zaman Serisi Verileri için Machine Learning

Chris Holdgraf

Fellow, Berkeley Institute for Data Science

Stationarity

  • Stationary time series do not change their statistical properties over time
  • E.g., mean, standard deviation, trends
  • Most time series are non-stationary to some extent
Python ile Zaman Serisi Verileri için Machine Learning

Python ile Zaman Serisi Verileri için Machine Learning

Model stability

  • Non-stationary data results in variability in our model
  • The statistical properties the model finds may change with the data
  • In addition, we will be less certain about the correct values of model parameters
  • How can we quantify this?
Python ile Zaman Serisi Verileri için Machine Learning

Cross validation to quantify parameter stability

  • One approach: use cross-validation
  • Calculate model parameters on each iteration
  • Assess parameter stability across all CV splits
Python ile Zaman Serisi Verileri için Machine Learning

Bootstrapping the mean

  • Bootstrapping is a common way to assess variability
  • The bootstrap:
    1. Take a random sample of data with replacement
    2. Calculate the mean of the sample
    3. Repeat this process many times (1000s)
    4. Calculate the percentiles of the result (usually 2.5, 97.5)

The result is a 95% confidence interval of the mean of each coefficient.

Python ile Zaman Serisi Verileri için Machine Learning

Bootstrapping the mean

from sklearn.utils import resample

# cv_coefficients has shape (n_cv_folds, n_coefficients)
n_boots = 100
bootstrap_means = np.zeros(n_boots, n_coefficients)
for ii in range(n_boots):
    # Generate random indices for our data with replacement, 
    # then take the sample mean
    random_sample = resample(cv_coefficients)
    bootstrap_means[ii] = random_sample.mean(axis=0)

# Compute the percentiles of choice for the bootstrapped means
percentiles = np.percentile(bootstrap_means, (2.5, 97.5), axis=0)
Python ile Zaman Serisi Verileri için Machine Learning

Plotting the bootstrapped coefficients

fig, ax = plt.subplots()
ax.scatter(many_shifts.columns, percentiles[0], marker='_', s=200)
ax.scatter(many_shifts.columns, percentiles[1], marker='_', s=200)

Python ile Zaman Serisi Verileri için Machine Learning

Assessing model performance stability

  • If using the TimeSeriesSplit, can plot the model's score over time.
  • This is useful in finding certain regions of time that hurt the score
  • Also useful to find non-stationary signals
Python ile Zaman Serisi Verileri için Machine Learning

Model performance over time

def my_corrcoef(est, X, y):
    """Return the correlation coefficient 
    between model predictions and a validation set."""
    return np.corrcoef(y, est.predict(X))[1, 0]

# Grab the date of the first index of each validation set
first_indices = [data.index[tt[0]] for tr, tt in cv.split(X, y)]

# Calculate the CV scores and convert to a Pandas Series
cv_scores = cross_val_score(model, X, y, cv=cv, scoring=my_corrcoef)
cv_scores = pd.Series(cv_scores, index=first_indices)
Python ile Zaman Serisi Verileri için Machine Learning

Visualizing model scores as a timeseries

fig, axs = plt.subplots(2, 1, figsize=(10, 5), sharex=True)

# Calculate a rolling mean of scores over time
cv_scores_mean = cv_scores.rolling(10, min_periods=1).mean()
cv_scores.plot(ax=axs[0])
axs[0].set(title='Validation scores (correlation)', ylim=[0, 1])

# Plot the raw data
data.plot(ax=axs[1])
axs[1].set(title='Validation data')
Python ile Zaman Serisi Verileri için Machine Learning

Visualizing model scores

Python ile Zaman Serisi Verileri için Machine Learning

Fixed windows with time series cross-validation

# Only keep the last 100 datapoints in the training data
window = 100

# Initialize the CV with this window size
cv = TimeSeriesSplit(n_splits=10, max_train_size=window)
Python ile Zaman Serisi Verileri için Machine Learning

Non-stationary signals

Python ile Zaman Serisi Verileri için Machine Learning

Let's practice!

Python ile Zaman Serisi Verileri için Machine Learning

Preparing Video For Download...