Modeling with tidymodels in R
David Svancer
Data Scientist
The collect_metrics()
function provides summarized results by default
summarize = FALSE
will provide all hyperparameter tuning resultsdt_tuning %>%
collect_metrics(summarize = FALSE)
# A tibble: 150 x 8
id cost_complexity tree_depth min_n .metric ... .estimate .config
<chr> <dbl> <int> <int> <chr> ... <dbl> <chr>
Fold01 0.0000000758 14 39 sens ... 0.75 Model1
Fold01 0.0000000758 14 39 spec ... 0.906 Model1
Fold01 0.0000000758 14 39 roc_auc ... 0.888 Model1
..... ............ .. .. ...... ... ..... ......
Fold10 0.00380 5 36 roc_auc ... 0.789 Model5
Selecting summarise = FALSE
within collect_metrics()
returns a tibble
dplyr
roc_auc
metricid
column.estimate
summary statisticsdt_tuning %>% collect_metrics(summarize = FALSE) %>%
filter(.metric == 'roc_auc') %>%
group_by(id) %>%
summarize(min_roc_auc = min(.estimate), median_roc_auc = median(.estimate), max_roc_auc = max(.estimate))
# A tibble: 10 x 4
id min_roc_auc median_roc_auc max_roc_auc
<chr> <dbl> <dbl> <dbl>
Fold01 0.830 0.885 0.888
Fold02 0.857 0.882 0.885
Fold03 0.818 0.836 0.836
...... .... .... ....
Fold10 0.762 0.790 0.813
The show_best()
function
n
performing models based on average value of metric
Model1
is the winnerdt_tuning %>%
show_best(metric = 'roc_auc', n = 5)
# A tibble: 5 x 9
cost_complexity tree_depth min_n .metric .estimator mean n std_err .config
<dbl> <int> <int> <chr> <chr> <dbl> <int> <dbl> <chr>
0.0000000758 14 39 roc_auc binary 0.827 10 0.0147 Model1
0.00380 5 36 roc_auc binary 0.825 10 0.0146 Model5
0.0243 5 34 roc_auc binary 0.823 10 0.0147 Model2
0.00000443 11 8 roc_auc binary 0.816 10 0.00786 Model3
0.000000600 3 5 roc_auc binary 0.814 10 0.0131 Model4
The select_best()
function
dt_tuning
results to select_best()
metric
on which to evaluate performance
Returns a tibble with the best performing model and hyperparameter values
best_dt_model <- dt_tuning %>% select_best(metric = 'roc_auc')
best_dt_model
# A tibble: 1 x 4
cost_complexity tree_depth min_n .config
<dbl> <int> <int> <chr>
0.0000000758 14 39 Model1
The finalize_workflow()
function will finalize a workflow
that contains a model object with tuning parameters
workflow
object
Returns a workflow
object with set hyperparameter values
final_leads_wkfl <- leads_tune_wkfl %>% finalize_workflow(best_dt_model)
final_leads_wkfl
== Workflow ========================================
Preprocessor: Recipe
Model: decision_tree()
-- Preprocessor ------------------------------------
3 Recipe Steps
* step_corr()
* step_normalize()
* step_dummy()
-- Model --------------------------------------------
Decision Tree Model Specification (classification)
Main Arguments:
cost_complexity = 0.0000000758
tree_depth = 14
min_n = 39
Computational engine: rpart
Finalized workflow
object can be trained with last_fit()
and original data split object, leads_split
Behind the scenes
recipe
trained and appliedleads_final_fit <- final_leads_wkfl %>% last_fit(split = leads_split)
leads_final_fit %>% collect_metrics()
# A tibble: 2 x 3
.metric .estimator .estimate
<chr> <chr> <dbl>
1 accuracy binary 0.771
2 roc_auc binary 0.793
Modeling with tidymodels in R