Machine Learning in the Tidyverse
Dmitriy (Dima) Gorenshteyn
Lead Data Scientist, Memorial Sloan Kettering Cancer Center
1) Actual attrition
classes
2) Predicted attrition
classes
3) A metric to compare 1) & 2)
attrition | class |
---|---|
Yes | TRUE |
No | FALSE |
validate$Attrition
No No No No No Yes No Yes ... No No No
validate_actual <- validate$Attrition == "Yes"
validate_actual
FALSE FALSE FALSE FALSE FALSE TRUE FALSE TRUE ... FALSE FALSE FALSE
P(attrition) | class |
---|---|
$ \gt $ 0.5 | TRUE |
$ \le $ 0.5 | FALSE |
validate_prob <- predict(model, validate, type = "response")
validate_prob
0.324 0.012 0.077 0.001 0.104 0.940 0.116 0.811 0.261 0.027 0.065 0.060
validate_predicted <- validate_prob > 0.5
validate_predicted
FALSE FALSE FALSE FALSE FALSE TRUE FALSE TRUE FALSE FALSE FALSE FALSE
table(validate_actual, validate_predicted)
validate_predicted
validate_actual FALSE TRUE
FALSE 181 5
TRUE 17 18
accuracy(validate_actual, validate_predicted)
0.9004525
precision(validate_actual, validate_predicted)
0.7826087
recall(validate_actual, validate_predicted)
0.5142857
Machine Learning in the Tidyverse