Model reliability

Developing Machine Learning Models for Production

Sinan Ozdemir

Data Scientist, Entrepreneur, and Author

Aligning ML models with business impact metrics

Business impact metrics measure the impact of ML on the business
- e.g. revenue, cost savings, customer satisfaction score (CSAT)
Should be aligned with model metrics
- e.g. churn predictor --> revenue
- e.g. Manufacturing maintenance predictor --> cost savings
- e.g. accuracy of chatbot intent detection --> CSAT

graph going up

Testing routines in ML pipelines

Unit tests test individual components
- e.g. test that a PCA instance returns the expected number of features
Integration tests consider the entire pipeline
- e.g. test that the input data is correctly preprocessed, the model makes accurate predictions, and the output is correctly post-processed
Smoke tests are quick tests that give you confidence that the system is working
- e.g. test that the model can correctly classify a small set of sample images
Test early and test often
- e.g. test the model on new data as soon as it becomes available

Example unit test

def test_pipeline():
    # Generate mock data for testing
    X_train = pd.DataFrame({'age': [25, 30, 35, 40], 'income': [50000, 60000, 70000, 80000])
    y_train = pd.Series([0, 0, 1, 1])

    pipeline = Pipeline([('preprocessing', DataPreprocessor()),  # Set up pipeline
                         ('model', LogisticRegression())])
    pipeline.fit(X_train, y_train)  # Fit pipeline on training data

    # Generate mock data for testing
    X_test = pd.DataFrame({'age': [30, 35, 40, 45], 'income': [55000, 65000, 75000, 85000])
    y_test = pd.Series([0, 0, 1, 1])
    y_pred = pipeline.predict(X_test)  
    accuracy = accuracy_score(y_test, y_pred)  # Evaluate pipeline on test data

    assert accuracy > 0.8, "Error: pipeline accuracy is too low."

Monitoring model staleness

Model staleness - model's performance decreases over time
- change in data or environment
Continuous Monitoring!

confused robot

Identifying and addressing model staleness

Identifying

Monitoring model performance
Monitor changes in data + environment

Addressing

Re-training the model on new data
- e.g. New feature needs to be included in the model
Update data pipeline to account for environment changes
- e.g. Changes in analytics platforms confuses your pipeline

Let's practice!

Developing Machine Learning Models for Production