Random forests

Supervised Learning in R: Regression

Nina Zumel and John Mount

Win-Vector, LCC

Random Forests

Multiple diverse decision trees averaged together

  • Reduces overfit
  • Increases model expressiveness
  • Finer grain predictions
Supervised Learning in R: Regression

Building a Random Forest Model

  1. Draw bootstrapped sample from training data
  2. For each sample grow a tree
    • At each node, pick best variable to split on (from a random subset of all variables)
    • Continue until tree is grown
  3. To score a datum, evaluate it with all the trees and average the results.
Supervised Learning in R: Regression

Example: Bike Rental Data

cnt ~ hr + holiday + workingday + 
  weathersit + temp + atemp + hum + windspeed

Supervised Learning in R: Regression

Random Forests with ranger()

model <- ranger(fmla, bikesJan, 
                num.trees = 500, 
                respect.unordered.factors = "order")
  • formula, data
  • num.trees (default 500) - use at least 200
  • mtry - number of variables to try at each node
    • default: square root of the total number of variables
  • respect.unordered.factors - recommend set to "order"
    • "safe" hashing of categorical variables
Supervised Learning in R: Regression

Random Forests with ranger()

model
Ranger result
...
OOB prediction error (MSE):       3103.623 
R squared (OOB):                  0.7837386

Random forest algorithm returns estimates of out-of-sample performance.

Supervised Learning in R: Regression

Predicting with a ranger() model

bikesFeb$pred <- predict(model, bikesFeb)$predictions

predict() inputs:

  • model
  • data

Predictions can be accessed in the element predictions.

Supervised Learning in R: Regression

Evaluating the model

Calculate RMSE:

bikesFeb %>% 
  mutate(residual = cnt - pred) %>%
  summarize(rmse = sqrt(mean(residual^2)))
      rmse
1 67.15169
Model RMSE
Quasipoisson 69.3
Random forests 67.15
Supervised Learning in R: Regression

Evaluating the model

Supervised Learning in R: Regression

Evaluating the model

Supervised Learning in R: Regression

Let's practice!

Supervised Learning in R: Regression

Preparing Video For Download...