Evaluating Classification Models

Machine Learning in the Tidyverse

Dmitriy (Dima) Gorenshteyn

Lead Data Scientist, Memorial Sloan Kettering Cancer Center

Ingredients for Performance Measurement

1) Actual attrition classes
2) Predicted attrition classes
3) A metric to compare 1) & 2)

Machine Learning in the Tidyverse

1) Prepare Actual Classes

attrition class
Yes TRUE
No FALSE
validate$Attrition
No  No  No  No  No  Yes No  Yes  ...  No  No  No
validate_actual <- validate$Attrition == "Yes"
validate_actual
FALSE FALSE FALSE FALSE FALSE  TRUE FALSE  TRUE ... FALSE FALSE FALSE
Machine Learning in the Tidyverse

2) Prepare Predicted Classes

P(attrition) class
$ \gt $ 0.5 TRUE
$ \le $ 0.5 FALSE
validate_prob <- predict(model, validate, type = "response")
validate_prob
0.324 0.012 0.077 0.001 0.104 0.940 0.116 0.811 0.261 0.027 0.065 0.060
validate_predicted <- validate_prob > 0.5
validate_predicted
FALSE FALSE FALSE FALSE FALSE  TRUE FALSE  TRUE FALSE FALSE FALSE FALSE
Machine Learning in the Tidyverse

3) A metric to compare 1) & 2)

table(validate_actual, validate_predicted)
               validate_predicted
validate_actual FALSE TRUE
          FALSE   181    5
          TRUE     17   18
Machine Learning in the Tidyverse

3) Metric: Accuracy

accuracy(validate_actual, validate_predicted)
0.9004525
Machine Learning in the Tidyverse

3) Metric: Precision

precision(validate_actual, validate_predicted)
0.7826087
Machine Learning in the Tidyverse

3) Metric: Recall

recall(validate_actual, validate_predicted)
0.5142857
Machine Learning in the Tidyverse

Let's practice!

Machine Learning in the Tidyverse

Preparing Video For Download...