Modeling with tidymodels in R
David Svancer
Data Scientist
Model parameters whose values are set prior to model training and control model complexity
parsnip
decision tree
cost_complexity
tree_depth
min_n
decision_tree()
function sets default hyperparameter values
cost_complexity
is set to 0.01tree_depth
is set to 30min_n
is set to 20These may not be the best values for all datasets
dt_model <- decision_tree() %>%
set_engine('rpart') %>%
set_mode('classification')
The tune()
function from the tune
package
tune()
in parsnip
model specificationdt_tune_model <- decision_tree(cost_complexity = tune(), tree_depth = tune(), min_n = tune()) %>% set_engine('rpart') %>% set_mode('classification')
dt_tune_model
Decision Tree Model Specification (classification)
Main Arguments:
cost_complexity = tune()
tree_depth = tune()
min_n = tune()
Computational engine: rpart
workflow
objects can be easily updated
leads_wkfl
leads_wkfl
to update_model()
and provide new decision tree model with tuning parametersleads_tune_wkfl <- leads_wkfl %>%
update_model(dt_tune_model)
leads_tune_wkfl
== Workflow ===============
Preprocessor: Recipe
Model: decision_tree()
-- Preprocessor -----------
3 Recipe Steps
* step_corr()
* step_normalize()
* step_dummy()
-- Model ------------------
Decision Tree Model Specification (classification)
Main Arguments: cost_complexity = tune()
tree_depth = tune()
min_n = tune()
Computational engine: rpart
Most common method for tuning hyperparameters
cost_complexity | tree_depth | min_n |
---|---|---|
0.001 | 20 | 35 |
0.001 | 20 | 15 |
0.001 | 35 | 35 |
0.001 | 35 | 15 |
0.2 | 20 | 35 |
... | ... | ... |
The parameters()
function from the dials
package
parsnip
model objecttune()
function, if anydials
packageparameters(dt_tune_model)
Collection of 3 parameters for tuning
identifier type object
cost_complexity cost_complexity nparam[+]
tree_depth tree_depth nparam[+]
min_n min_n nparam[+]
Generating random combinations
The grid_random()
function
parameters()
functionsize
sets the number of random combinations to generateset.seed()
function before grid_random()
for reproducibilityset.seed(214) grid_random(parameters(dt_tune_model),
size = 5)
# A tibble: 5 x 3
cost_complexity tree_depth min_n
<dbl> <int> <int>
1 0.0000000758 14 39
2 0.0243 5 34
3 0.00000443 11 8
4 0.000000600 3 5
5 0.00380 5 36
First step in hyperparameter tuning
dt_grid
contains 5 random combinations of hyperparameter valuesset.seed(214) dt_grid <- grid_random(parameters(dt_tune_model), size = 5)
dt_grid
# A tibble: 5 x 3
cost_complexity tree_depth min_n
<dbl> <int> <int>
1 0.0000000758 14 39
2 0.0243 5 34
3 0.00000443 11 8
4 0.000000600 3 5
5 0.00380 5 36
The tune_grid()
function performs hyperparameter tuning
Takes the following arguments:
workflow
or parsnip
modelresamples
grid
metrics
functionReturns tibble of results
.metrics
dt_tuning <- leads_tune_wkfl %>%
tune_grid(resamples = leads_folds,
grid = dt_grid,
metrics = leads_metrics)
dt_tuning
# Tuning results
# 10-fold cross-validation using stratification
# A tibble: 10 x 4
splits id .metrics ..
<list> <chr> <list> ..
<split [896/100]> Fold01 <tibble [15 x 7]> ..
................ ...... ............... ..
<split [897/99]> Fold09 <tibble [15 x 7]> ..
<split [897/99]> Fold10 <tibble [15 x 7]> ..
The collect_metrics()
function provides summarized results by default
dt_tuning %>%
collect_metrics()
# A tibble: 15 x 9
cost_complexity tree_depth min_n .metric .estimator mean n std_err .config
<dbl> <int> <int> <chr> <chr> <dbl> <int> <dbl> <chr>
1 0.0000000758 14 39 roc_auc binary 0.827 10 0.0147 Model1
2 0.0000000758 14 39 sens binary 0.728 10 0.0277 Model1
3 0.0000000758 14 39 spec binary 0.865 10 0.0156 Model1
4 0.0243 5 34 roc_auc binary 0.823 10 0.0147 Model2
. ...... .. .. .... ...... ..... .. ..... ......
14 0.00380 5 36 sens binary 0.747 10 0.0209 Model5
15 0.00380 5 36 spec binary 0.858 10 0.0161 Model5
Modeling with tidymodels in R