Univariate drift detection

Monitoring Machine Learning in Python

Hakim Elakhrass

CEO and co-founder

What is univariate drift detection?

The image shows monitoring workflow and where the univariate method is placed there.

Univariate methods

Jensen-Shannen distance - both categorical and continuous
Hellinger - categorical and continuous
Wasserstein - only continuous
Kolgomorov-Smirnov - only continuous
L-infinity - only categorical
Chi2 - only categorical

¹ https://nannyml.readthedocs.io/en/stable/how_it_works/univariate_drift_comparison.html

Code implementation

# Intialize the univariate drift calculator
uv_calc = nannyml.UnivariateDriftCalculator(
    continuous_methods=['wasserstein', 'hellinger'],
    categorical_methods=['jensen_shannon', 'l_infinity', 'chi2'],
    column_names=feature_column_names,
    timestamp_column_name='timestamp',
    chunk_period='d'
    )

# Fit, calculate and plot the results
uv_calc.fit(reference)
uv_results = uv_calc.calculate(analysis)
uv_results.plot().show()

Filtering

Based on the column names
Based on the univariate methods

# Filter the univariate results
filtered_figure = uv_results.filter(column_names=['trip_distance', 'fare_amount'], 
            methods=['jensen_shannon'])

# Plot the filtered results
filtered_figure.show().plot()

Alert count ranker

Rank features based on the number of alerts

# Initialize the alert count ranker
alert_count_ranker = nannyml.AlertCountRanker()
alert_count_ranked_results = alert_count_ranker.rank(
    uv_results,
    only_drifting=False)
# Display the results
display(alert_count_ranked_results)

The image shows the dataframe with information about number of alerts for specific feature.

Correlation ranker

Ranks features based on how much they correlate to absolute changes in performance

# Initialize the correlation ranker
correlation_ranker = nannyml.CorrelationRanker()
correlation_ranker.fit(perf_results.filter(period='reference'))
correlation_ranked_results = correlation_ranker.rank(uv_results, perf_results)

# Display the results
display(correlation_ranked_results)

The image shows the dataframe with pearson correlation and p-value for each feature.

Monitoring feature's distribution

Gives better insights and improves explainability

# Create distribution plots
distribution_results = uv_results.plot(kind='distribution')

# Show the plots
distribution_results.show()

Feature distribution plot

The image shows distribution plots for continuous and categorical features.

Let's practice!

Monitoring Machine Learning in Python