Modeling with tidymodels in R
David Svancer
Data Scientist
The collect_metrics() function provides summarized results by default
summarize = FALSE will provide all hyperparameter tuning resultsdt_tuning %>%
collect_metrics(summarize = FALSE)
# A tibble: 150 x 8
id cost_complexity tree_depth min_n .metric ... .estimate .config
<chr> <dbl> <int> <int> <chr> ... <dbl> <chr>
Fold01 0.0000000758 14 39 sens ... 0.75 Model1
Fold01 0.0000000758 14 39 spec ... 0.906 Model1
Fold01 0.0000000758 14 39 roc_auc ... 0.888 Model1
..... ............ .. .. ...... ... ..... ......
Fold10 0.00380 5 36 roc_auc ... 0.789 Model5
Selecting summarise = FALSE within collect_metrics() returns a tibble
dplyrroc_auc metricid column.estimate summary statisticsdt_tuning %>% collect_metrics(summarize = FALSE) %>%filter(.metric == 'roc_auc') %>%group_by(id) %>%summarize(min_roc_auc = min(.estimate), median_roc_auc = median(.estimate), max_roc_auc = max(.estimate))
# A tibble: 10 x 4
id min_roc_auc median_roc_auc max_roc_auc
<chr> <dbl> <dbl> <dbl>
Fold01 0.830 0.885 0.888
Fold02 0.857 0.882 0.885
Fold03 0.818 0.836 0.836
...... .... .... ....
Fold10 0.762 0.790 0.813
The show_best() function
n performing models based on average value of metricModel1 is the winnerdt_tuning %>%
show_best(metric = 'roc_auc', n = 5)
# A tibble: 5 x 9
cost_complexity tree_depth min_n .metric .estimator mean n std_err .config
<dbl> <int> <int> <chr> <chr> <dbl> <int> <dbl> <chr>
0.0000000758 14 39 roc_auc binary 0.827 10 0.0147 Model1
0.00380 5 36 roc_auc binary 0.825 10 0.0146 Model5
0.0243 5 34 roc_auc binary 0.823 10 0.0147 Model2
0.00000443 11 8 roc_auc binary 0.816 10 0.00786 Model3
0.000000600 3 5 roc_auc binary 0.814 10 0.0131 Model4
The select_best() function
dt_tuning results to select_best()metric on which to evaluate performance
Returns a tibble with the best performing model and hyperparameter values
best_dt_model <- dt_tuning %>% select_best(metric = 'roc_auc')best_dt_model
# A tibble: 1 x 4
cost_complexity tree_depth min_n .config
<dbl> <int> <int> <chr>
0.0000000758 14 39 Model1
The finalize_workflow() function will finalize a workflow that contains a model object with tuning parameters
workflow object
Returns a workflow object with set hyperparameter values
final_leads_wkfl <- leads_tune_wkfl %>% finalize_workflow(best_dt_model)final_leads_wkfl
== Workflow ========================================
Preprocessor: Recipe
Model: decision_tree()
-- Preprocessor ------------------------------------
3 Recipe Steps
* step_corr()
* step_normalize()
* step_dummy()
-- Model --------------------------------------------
Decision Tree Model Specification (classification)
Main Arguments:
cost_complexity = 0.0000000758
tree_depth = 14
min_n = 39
Computational engine: rpart
Finalized workflow object can be trained with last_fit() and original data split object, leads_split
Behind the scenes
recipe trained and appliedleads_final_fit <- final_leads_wkfl %>% last_fit(split = leads_split)leads_final_fit %>% collect_metrics()
# A tibble: 2 x 3
.metric .estimator .estimate
<chr> <chr> <dbl>
1 accuracy binary 0.771
2 roc_auc binary 0.793
Modeling with tidymodels in R