Modeling with tidymodels in R
David Svancer
Data Scientist
Heatmap with autoplot()
autoplot()
type
to 'heatmap'
conf_mat(leads_results, truth = purchased, estimate = .pred_class) %>%
autoplot(type = 'heatmap')
Mosaic with autoplot()
type
to 'mosaic'
conf_mat(leads_results,
truth = purchased,
estimate = .pred_class) %>%
autoplot(type = 'mosaic')
Mosaic with autoplot()
type
to 'mosaic'
conf_mat(leads_results,
truth = purchased,
estimate = .pred_class) %>%
autoplot(type = 'mosaic')
Default probability threshold in binary classification is 0.5
leads_results
.pred_yes
is greater than or equal to 0.5 then .pred_class
is set to 'yes' by the predict()
function in tidymodels
leads_results
# A tibble: 332 x 4
purchased .pred_class .pred_yes .pred_no
<fct> <fct> <dbl> <dbl>
1 no no 0.134 0.866
2 yes yes 0.729 0.271
3 no no 0.133 0.867
4 no no 0.0916 0.908
5 yes yes 0.598 0.402
6 no no 0.128 0.872
7 yes no 0.112 0.888
8 no no 0.169 0.831
9 no no 0.158 0.842
10 yes yes 0.520 0.480
# ... with 322 more rows
How does a classification model perform across a range of thresholds?
.pred_yes
column of the test dataset results
threshold | specificity | sensitivity |
---|---|---|
0 | 0 | 1 |
0.11 | 0.01 | 0.98 |
0.15 | 0.05 | 0.97 |
... | ... | ... |
0.84 | 0.89 | 0.08 |
0.87 | 0.94 | 0.02 |
0.91 | 0.99 | 0 |
1 | 1 | 0 |
Receiver operating characteristic (ROC) curve
Receiver operating characteristic (ROC) curve
Optimal performance is at the point (0, 1)
Optimal performance is at the point (0, 1)
Poor performance
The area under the ROC curve (ROC AUC) captures the ROC curve information of a classification model in a single number
Useful interpretation as a letter grade of classification performance
The roc_curve()
function
truth
column with true outcome categories.pred_yes
in leads_results
tibble
.pred_yes
leads_results %>%
roc_curve(truth = purchased, .pred_yes)
# A tibble: 331 x 3
.threshold specificity sensitivity
<dbl> <dbl> <dbl>
1 -Inf 0 1
2 0.0871 0 1
3 0.0888 0.00472 1
4 0.0893 0.00943 1
5 0.0896 0.0142 1
6 0.0902 0.0142 0.992
7 0.0916 0.0142 0.983
8 0.0944 0.0189 0.983
# ... with 323 more rows
Passing the results of roc_curve()
to the autoplot()
function returns an ROC curve plot
leads_results %>%
roc_curve(truth = purchased, .pred_yes) %>%
autoplot()
The roc_auc()
function from yardstick
will calculate the ROC AUC
truth
columnroc_auc(leads_results,
truth = purchased,
.pred_yes)
# A tibble: 1 x 3
.metric .estimator .estimate
<chr> <chr> <dbl>
1 roc_auc binary 0.763
Modeling with tidymodels in R