Machine Learning with Tree-Based Models in R
Sandro Raabe
Data Scientist
no
can have 98% accuracy$\rightarrow$ Possible in imbalanced dataset with 98% of negative samples
predictions
# A tibble: 153 x 2
.pred_class true_class
<fct> <fct>
1 yes no
2 no no
3 no yes
4 yes yes
# Calculate single-threshold sensitivity
sens(predictions,
estimate = .pred_class,
truth = true_class)
# A tibble: 1 x 2
.metric .estimate
<chr> <dbl>
1 sensitivity 0.872
accuracy()
and conf_mat()
# Predict probabilities on test set predictions <- predict(model, data_test,
type = "prob") %>%
bind_cols(data_test)
# A tibble: 9,116 x 13
.pred_yes still_customer age gender ...
<dbl> <fct> <int> <fct> ...
1 0.0557 no 45 M ...
2 0.0625 no 49 F ...
3 0.330 no 51 M ...
4 ...
...
# Calculate the ROC curve for all thresholds roc <- roc_curve(predictions,
estimate = .pred_yes,
truth = still_customer)
# Plot the ROC curve autoplot(roc)
# Calculate area under curve
roc_auc(predictions,
estimate = .pred_yes,
truth = still_customer)
# A tibble: 1 x 3
.metric .estimator .estimate
<chr> <chr> <dbl>
1 roc_auc binary 0.872
Machine Learning with Tree-Based Models in R