Machine Learning with Tree-Based Models in R
Sandro Raabe
Data Scientist
Hyperparameters in parsnip
decision trees:
min_n
: Minimum number of samples required to split a nodetree_depth
: maximum allowed depth of the treecost_complexity
: penalty for tree complexityDefault values set by parsnip
:
decision_tree(min_n = 20, tree_depth = 30, cost_complexity = 0.01)
Goal of hyperparameter tuning is finding the optimal set of hyperparameter values.
spec_untuned <- decision_tree(
min_n = tune(), tree_depth = tune()
) %>% set_engine("rpart") %>% set_mode("classification")
Decision Tree Model Specification (classification)
Main Arguments: tree_depth = tune() min_n = tune()
tune()
labels parameters for tuningtree_grid <- grid_regular(
parameters(spec_untuned),
levels = 3 )
# A tibble: 9 x 2
min_n tree_depth
1 2 1
2 21 1
3 40 1
4 2 8
5 21 8
6 40 8
7 2 15
8 21 15
9 40 15
parameters()
levels
: number of grid points for each hyperparameter
Usage and arguments:
metric_set()
tune_results <- tune_grid(
spec_untuned,
outcome ~ .,
resamples = my_folds,
grid = tree_grid,
metrics = metric_set(accuracy))
autoplot(tune_results)
# Select the best performing parameters final_params <- select_best(tune_results)
final_params
# A tibble: 1 x 3
min_n tree_depth .config
<int> <int> <chr>
1 2 8 Model4
# Plug them into the specification best_spec <- finalize_model(spec_untuned, final_params)
best_spec
Decision Tree Model Specification
(classification)
Main Arguments:
tree_depth = 8
min_n = 2
Computational engine: rpart
Machine Learning with Tree-Based Models in R