Working with calculated and estimated results

Monitoring Machine Learning in Python

Maciej Balawejder

Data Scientist

How to chunk the data?

The image shows three types of chunking: type-based, size-based, and number-based.

Specifying different chunks

The image shows time-based chunking alias that can be passed as an argument for the chunk period parameter.

# Initialize the algorithm
cbpe = nannyml.CBPE(
   problem_type='classification_binary',
   y_pred_proba='predicted_probability',
   y_pred='prediction',
   y_true='employed',
   metrics=['roc_auc'],
   chunk_period='m',
   # chunk_size = 5000, 
   # chunk_number = 10
)

Initializing custom thresholds

Standard deviation thresholds

Manually set lower and upper standard deviation multiplier

# Standard deviation thresholds
stdt = StandardDeviationThreshold(
    std_lower_multiplier=3, 
    std_upper_multiplier=3
    )

Constant thresholds

Manually set the lower and upper threshold values

# Constant thresholds
ct = ConstantThreshold(
    lower=0.85, 
    upper=0.95
    )

Specifying custom thresholds

# Import threshold methods(last slide)
from nannyml.thresholds import ConstantThreshold, StandardDeviationThreshold

# Passing thresholds to the CBPE algorithm
estimator = nannyml.CBPE(...
    metrics = ['roc_auc', 'accuracy'],
    thresholds={'roc_auc': ct, 'accuracy' : stdt}
)

The image shows estimated performance plots for ROC AUC and accuracy metric with custom thresholds.

Filtering results

By period

filtered_results = results.filter(period='analysis')

By metrics

filtered_results = results.filter(metrics=['mae'])

Both

filtered_results = results.filter(period='analysis', metrics=['mae'])

Export results to dataframe

# Export results to dataframe format
results.filter(period='analysis').to_df()

The image shows the results in dataframe format.

Let's practice!

Monitoring Machine Learning in Python