Random forest

Machine Learning with Tree-Based Models in R

Sandro Raabe

Data Scientist

Random forest

  • Suited for high-dimensional data
  • Easy to use
  • Out-of-the-box performance
  • Implemented in a variety of packages: ranger, randomForest
  • tidymodels interface to these packages: rand_forest() (contained in parsnip package)
Machine Learning with Tree-Based Models in R

Idea

  • Basic idea (identical to bagging): train trees on bootstrap samples
  • Key difference: random predictors across trees $\rightarrow$ random forest
Machine Learning with Tree-Based Models in R

Intuition

random forest sketch

Machine Learning with Tree-Based Models in R

Coding: Specify a random forest model

  • Function name: rand_forest()

Hyperparameters:

  • mtry: predictors seen at each node, default:
    $$\left\lfloor\sqrt\text{num predictors}\right\rfloor$$
  • trees: number of trees in the forest
  • min_n: smallest node size allowed
rand_forest(

mtry = 4,
trees = 500,
min_n = 10) %>%
# Set the mode set_mode("classification") %>%
# Use engine ranger or randomForest set_engine("ranger")
Machine Learning with Tree-Based Models in R

Coding: Specify a random forest model

spec <- rand_forest(trees = 100) %>%

set_mode("classification") %>%
set_engine("ranger")
Random Forest Model Specification

(classification)
Main Arguments: trees = 100
Computational engine: ranger
Machine Learning with Tree-Based Models in R

Training a forest

spec %>% fit(still_customer ~ ., data = customers_train)
parsnip model object

Fit time:  631ms 
Ranger result

Number of trees:                  100 
Sample size:                      9116 
Number of independent variables:  19 
Mtry:                             4 
Target node size:                 10
Machine Learning with Tree-Based Models in R

Variable importance

rand_forest(mode = "classification") %>%
    set_engine("ranger", importance = "impurity") %>%

fit(still_customer ~ ., data = customers_train) %>%
vip::vip()

vip plot

Machine Learning with Tree-Based Models in R

Let's plant a random forest!

Machine Learning with Tree-Based Models in R

Preparing Video For Download...