Model evaluation: imbalanced classification models

Practicing Machine Learning Interview Questions in Python

Lisa Stuart

Data Scientist

Class imbalance

  • Categorical target variable
    • Approx equal number observations/class
    • Large difference --> misleading results

Imbalanced vs balance classes

Practicing Machine Learning Interview Questions in Python

Confusion matrix

Confusion matrix

1 https://scaryscientist.blogspot.com/2016/03/confusion-matrix.html
Practicing Machine Learning Interview Questions in Python

Performance metrics

Performance metrics

1 https://scaryscientist.blogspot.com/2016/03/confusion-matrix.html
Practicing Machine Learning Interview Questions in Python

Metrics from the matrix

Performance metrics

1 https://scaryscientist.blogspot.com/2016/03/confusion-matrix.html
Practicing Machine Learning Interview Questions in Python

Resampling techniques

  • Oversample minority class
  • Undersample majority class
  • NOTE: Split into test and train sets BEFORE re-sampling!

Resampling techniques

1 https://www.svds.com/learning-imbalanced-classes/
Practicing Machine Learning Interview Questions in Python

Functions

Function returns
sklearn.linear_model.LogisticRegression logistic regression
sklearn.metrics.confusion_matrix(y_test,y_pred) confusion matrix
sklearn.metrics.precision_score(y_test,y_pred) precision
sklearn.metrics.recall_score(y_test,y_pred) recall
sklearn.metrics.f1_score(y_test,y_pred) f1 score
sklearn.utils.resample(deny, n_samples=len(approve)) resamples
Practicing Machine Learning Interview Questions in Python

Let's practice!

Practicing Machine Learning Interview Questions in Python

Preparing Video For Download...