Modeling with tidymodels in R
David Svancer
Data Scientist
Heatmap with autoplot()
autoplot()type to 'heatmap'
conf_mat(leads_results, truth = purchased, estimate = .pred_class) %>%autoplot(type = 'heatmap')

Mosaic with autoplot()
type to 'mosaic'
conf_mat(leads_results,
truth = purchased,
estimate = .pred_class) %>%
autoplot(type = 'mosaic')

Mosaic with autoplot()
type to 'mosaic'conf_mat(leads_results,
truth = purchased,
estimate = .pred_class) %>%
autoplot(type = 'mosaic')

Default probability threshold in binary classification is 0.5
leads_results
.pred_yes is greater than or equal to 0.5 then .pred_class is set to 'yes' by the predict() function in tidymodelsleads_results
# A tibble: 332 x 4
purchased .pred_class .pred_yes .pred_no
<fct> <fct> <dbl> <dbl>
1 no no 0.134 0.866
2 yes yes 0.729 0.271
3 no no 0.133 0.867
4 no no 0.0916 0.908
5 yes yes 0.598 0.402
6 no no 0.128 0.872
7 yes no 0.112 0.888
8 no no 0.169 0.831
9 no no 0.158 0.842
10 yes yes 0.520 0.480
# ... with 322 more rows
How does a classification model perform across a range of thresholds?
.pred_yes column of the test dataset results
| threshold | specificity | sensitivity |
|---|---|---|
| 0 | 0 | 1 |
| 0.11 | 0.01 | 0.98 |
| 0.15 | 0.05 | 0.97 |
| ... | ... | ... |
| 0.84 | 0.89 | 0.08 |
| 0.87 | 0.94 | 0.02 |
| 0.91 | 0.99 | 0 |
| 1 | 1 | 0 |
Receiver operating characteristic (ROC) curve
Receiver operating characteristic (ROC) curve

Optimal performance is at the point (0, 1)
Optimal performance is at the point (0, 1)
Poor performance

The area under the ROC curve (ROC AUC) captures the ROC curve information of a classification model in a single number
Useful interpretation as a letter grade of classification performance

The roc_curve() function
truth column with true outcome categories.pred_yes in leads_results tibble
.pred_yesleads_results %>%
roc_curve(truth = purchased, .pred_yes)
# A tibble: 331 x 3
.threshold specificity sensitivity
<dbl> <dbl> <dbl>
1 -Inf 0 1
2 0.0871 0 1
3 0.0888 0.00472 1
4 0.0893 0.00943 1
5 0.0896 0.0142 1
6 0.0902 0.0142 0.992
7 0.0916 0.0142 0.983
8 0.0944 0.0189 0.983
# ... with 323 more rows
Passing the results of roc_curve() to the autoplot() function returns an ROC curve plot
leads_results %>%
roc_curve(truth = purchased, .pred_yes) %>%
autoplot()

The roc_auc() function from yardstick will calculate the ROC AUC
truth columnroc_auc(leads_results,
truth = purchased,
.pred_yes)
# A tibble: 1 x 3
.metric .estimator .estimate
<chr> <chr> <dbl>
1 roc_auc binary 0.763
Modeling with tidymodels in R