Model Drift

Designing Forecasting Pipelines for Production

Rami Krispin

Senior Manager, Data Science and Engineering

Model drift

Model performance degrades over time 📉

Often driven by concept drift

Historical models become misaligned ❌

Model drift

Concept drift

Other causes of model drift

Model life cycle

Pipeline life cycle

Detect model drift

Detecting model drift

Detect model drift

Track forecast accuracy

Identify drift

import pandas as pd
import plotly.graph_objects as go

fc_log = pd.read_csv("./data/us48_forecast_log.csv")


fc_log["mape_ma_7"] = fc_log["mape"].rolling(window = 7).mean()
fc_log["mape_ma_14"] = fc_log["mape"].rolling(window = 14).mean()

Identify drift

print(fc_log[["mape_ma_7","mape_ma_14"]].tail(10))

     mape_ma_7  mape_ma_14
120   0.054181    0.043237
121   0.062931    0.047272
122   0.063285    0.045938
123   0.058811    0.046456
124   0.060165    0.046120
125   0.055948    0.045952
126   0.043080    0.045459
127   0.035748    0.044965
128   0.028009    0.045470
129   0.025801    0.044543

Identify drift

threshold = 3

Identify drift

threshold = 3

# Setting plotly figure
p = go.Figure()
# Add the model performance (MAPE) over time
p.add_trace(go.Scatter(x = fc_log["forecast_start"], y = 100 * fc_log["mape"],
                        mode="lines",
                        name="MAPE",
                        line=dict(color='royalblue', width=2)))
# Setting the plots layout
p.update_layout(title = "Forecast Error Rate Over Time", 
               xaxis_title="Model Error Rate Since Deployment",
               yaxis_title="MAPE (%)")

Identify drift

# Adding the threshold  and 7 and 14 rolling windows
p.add_shape(type="line",
              x0=fc_log["forecast_start"].min(), x1=fc_log["forecast_start"].max(),
              y0=threshold, y1=threshold,
              line=dict(color="red", width=2, dash = "dash"))

p.add_trace(go.Scatter(x = fc_log["forecast_start"], y = 100 * fc_log["mape_ma_7"],
                        mode="lines",name="7 Days MA",
                        line=dict(color="green", width=2)))

p.add_trace(go.Scatter(x = fc_log["forecast_start"], y = 100 * fc_log["mape_ma_14"],
                        mode="lines",name="14 Days MA",
                        line=dict(color="orange", width=2)))


p.show()

Identify drift

Let's practice!

Designing Forecasting Pipelines for Production