Modeling with tidymodels in R
David Svancer
Data Scientist
Model parameters whose values are set prior to model training and control model complexity
parsnip decision tree
cost_complexitytree_depthmin_n
decision_tree() function sets default hyperparameter values
cost_complexity is set to 0.01tree_depth is set to 30min_n is set to 20These may not be the best values for all datasets
dt_model <- decision_tree() %>%
set_engine('rpart') %>%
set_mode('classification')
The tune() function from the tune package
tune() in parsnip model specificationdt_tune_model <- decision_tree(cost_complexity = tune(), tree_depth = tune(), min_n = tune()) %>% set_engine('rpart') %>% set_mode('classification')dt_tune_model
Decision Tree Model Specification (classification)
Main Arguments:
cost_complexity = tune()
tree_depth = tune()
min_n = tune()
Computational engine: rpart
workflow objects can be easily updated
leads_wkflleads_wkfl to update_model() and provide new decision tree model with tuning parametersleads_tune_wkfl <- leads_wkfl %>%update_model(dt_tune_model)leads_tune_wkfl
== Workflow ===============
Preprocessor: Recipe
Model: decision_tree()
-- Preprocessor -----------
3 Recipe Steps
* step_corr()
* step_normalize()
* step_dummy()
-- Model ------------------
Decision Tree Model Specification (classification)
Main Arguments: cost_complexity = tune()
tree_depth = tune()
min_n = tune()
Computational engine: rpart
Most common method for tuning hyperparameters
| cost_complexity | tree_depth | min_n |
|---|---|---|
| 0.001 | 20 | 35 |
| 0.001 | 20 | 15 |
| 0.001 | 35 | 35 |
| 0.001 | 35 | 15 |
| 0.2 | 20 | 35 |
| ... | ... | ... |
The parameters() function from the dials package
parsnip model objecttune() function, if anydials packageparameters(dt_tune_model)
Collection of 3 parameters for tuning
identifier type object
cost_complexity cost_complexity nparam[+]
tree_depth tree_depth nparam[+]
min_n min_n nparam[+]
Generating random combinations
The grid_random() function
parameters() functionsize sets the number of random combinations to generateset.seed() function before grid_random() for reproducibilityset.seed(214) grid_random(parameters(dt_tune_model),size = 5)
# A tibble: 5 x 3
cost_complexity tree_depth min_n
<dbl> <int> <int>
1 0.0000000758 14 39
2 0.0243 5 34
3 0.00000443 11 8
4 0.000000600 3 5
5 0.00380 5 36
First step in hyperparameter tuning
dt_grid contains 5 random combinations of hyperparameter valuesset.seed(214) dt_grid <- grid_random(parameters(dt_tune_model), size = 5)dt_grid
# A tibble: 5 x 3
cost_complexity tree_depth min_n
<dbl> <int> <int>
1 0.0000000758 14 39
2 0.0243 5 34
3 0.00000443 11 8
4 0.000000600 3 5
5 0.00380 5 36
The tune_grid() function performs hyperparameter tuning
Takes the following arguments:
workflow or parsnip modelresamplesgridmetrics functionReturns tibble of results
.metricsdt_tuning <- leads_tune_wkfl %>%tune_grid(resamples = leads_folds,grid = dt_grid,metrics = leads_metrics)
dt_tuning
# Tuning results
# 10-fold cross-validation using stratification
# A tibble: 10 x 4
splits id .metrics ..
<list> <chr> <list> ..
<split [896/100]> Fold01 <tibble [15 x 7]> ..
................ ...... ............... ..
<split [897/99]> Fold09 <tibble [15 x 7]> ..
<split [897/99]> Fold10 <tibble [15 x 7]> ..
The collect_metrics() function provides summarized results by default
dt_tuning %>%
collect_metrics()
# A tibble: 15 x 9
cost_complexity tree_depth min_n .metric .estimator mean n std_err .config
<dbl> <int> <int> <chr> <chr> <dbl> <int> <dbl> <chr>
1 0.0000000758 14 39 roc_auc binary 0.827 10 0.0147 Model1
2 0.0000000758 14 39 sens binary 0.728 10 0.0277 Model1
3 0.0000000758 14 39 spec binary 0.865 10 0.0156 Model1
4 0.0243 5 34 roc_auc binary 0.823 10 0.0147 Model2
. ...... .. .. .... ...... ..... .. ..... ......
14 0.00380 5 36 sens binary 0.747 10 0.0209 Model5
15 0.00380 5 36 spec binary 0.858 10 0.0161 Model5
Modeling with tidymodels in R