Pipeline architecture

Designing Forecasting Pipelines for Production

Rami Krispin

Senior Manager, Data Science and Engineering

Model deployment

Experimentation Process

Designing Forecasting Pipelines for Production

Model deployment

ETL and ML Pipelines

Designing Forecasting Pipelines for Production

Pipelines requirements

Data ingestion

  • Refresh frequency - daily

Forecast refresh

  • Refresh frequency - daily
  • Forecast horizon - 72 hours

Robust

  • Unit testing and validation steps
  • Logs
  • Easy to maintain
Designing Forecasting Pipelines for Production

Pipeline design

Pipeline requirements, including API requests, data transformation, forecast refresh, and logging

Designing Forecasting Pipelines for Production

Pipeline design

Data ingestion process

Designing Forecasting Pipelines for Production

Pipeline design

Forecasting automation

Designing Forecasting Pipelines for Production

Pipeline design

Data storage

Designing Forecasting Pipelines for Production

Pipeline design

Logging

Designing Forecasting Pipelines for Production

Pipeline design

Tools used in pipeline - Airflow, mlflow, nixtla

Designing Forecasting Pipelines for Production

Model registry

Approaches

  • Register all models
  • Register only the top model

Requirements

  • MLflow flavor
  • Customized function
  • Fitted object
  • Predicted method
Designing Forecasting Pipelines for Production

Model registry

from lightgbm import LGBMRegressor
from mlforecast import MLForecast
import mlflow
import mlforecast.flavor


experiment_name = "ml_forecast" mlflow_path = "file:///mlruns"
meta = mlflow.get_experiment_by_name(experiment_name)
Designing Forecasting Pipelines for Production

Model registry

model = LGBMRegressor(n_estimators = 500, learning_rate= 0.05)


params = { "freq": "h", "lags": list(range(1, 24)), "date_features": ["month", "day", "dayofweek", "week", "hour"] }
Designing Forecasting Pipelines for Production

Model registry

mlf = MLForecast(
    models= model,  
    freq= params["freq"], 
    lags=params["lags"],
    date_features=params["date_features"]
)

mlf.fit(ts)
Designing Forecasting Pipelines for Production

Model registry

run_time = datetime.datetime.now().strftime("%Y-%m-%d %H-%M-%S")
run_name = f"lightGBM6_{run_time}"


print(run_name)
'lightGBM6_2025-05-19 05-12-16'
with mlflow.start_run(experiment_id=meta.experiment_id, 
                      run_name=run_name) as run:
  mlforecast.flavor.log_model(model=mlf, artifact_path="prod_model")
Designing Forecasting Pipelines for Production

Model registry

Model registry with run name highlighted

Designing Forecasting Pipelines for Production

Model registry

Model registry with model name highlighted

Designing Forecasting Pipelines for Production

Model registry

Model registry metadata

Designing Forecasting Pipelines for Production

Let's practice!

Designing Forecasting Pipelines for Production

Preparing Video For Download...