Gradient boosting machines

Supervised Learning in R: Regression

Nina Zumel and John Mount

Win-Vector, LLC

How Gradient Boosting Works

  1. Fit a shallow tree $T_1$ to the data: $M_1 = T_1$
Supervised Learning in R: Regression

How Gradient Boosting Works

  1. Fit a shallow tree $T_1$ to the data: $M_1 = T_1$
  2. Fit a tree T_2 to the residuals. Find $\gamma$ such that $M_2 = M_1 + \gamma T_2$ is the best fit to data
Supervised Learning in R: Regression

How Gradient Boosting Works

Regularization: learning rate $\eta \in(0,1)$

$$ M_2 = M_1 + \eta \gamma T_2 $$

  • Larger $\eta$: faster learning
  • Smaller $\eta$: less risk of overfit
Supervised Learning in R: Regression

How Gradient Boosting Works

  1. Fit a shallow tree $T_1$ to the data
    • $M_1 = T_1$
  2. Fit a tree T_2 to the residuals.
    • $M_2 = M_1 + \eta \gamma_2 T_2$
  3. Repeat (2) until stopping condition met

Final Model:

$$ M = M_1 + \eta \sum \gamma_i T_i $$

Supervised Learning in R: Regression

Cross-validation to Guard Against Overfit

Training error keeps decreasing, but test error doesn't

Supervised Learning in R: Regression

Best Practice (with xgboost())

  1. Run xgb.cv() with a large number of rounds (trees).
Supervised Learning in R: Regression

Best Practice (with xgboost())

  1. Run xgb.cv() with a large number of rounds (trees).
  2. xgb.cv()$evaluation_log: records estimated RMSE for each round.
    • Find the number of trees that minimizes estimated RMSE: $n_{best}$
Supervised Learning in R: Regression

Best Practice (with xgboost())

  1. Run xgb.cv() with a large number of rounds (trees).
  2. xgb.cv()$evaluation_log: records estimated RMSE for each round.
    • Find the number of trees that minimizes estimated RMSE: $n_{best}$
  3. Run xgboost(), setting nrounds = $n_{best}$
Supervised Learning in R: Regression

Example: Bike Rental Model

First, prepare the data

treatplan <- designTreatmentsZ(bikesJan, vars)
newvars <- treatplan$scoreFrame %>%
     filter(code %in% c("clean", "lev")) %>%
     use_series(varName)

bikesJan.treat <- prepare(treatplan, bikesJan, varRestriction = newvars)

For xgboost():

  • Input data: as.matrix(bikesJan.treat)
  • Outcome: bikesJan$cnt
Supervised Learning in R: Regression

Training a model with xgboost() / xgb.cv()

cv <- xgb.cv(data = as.matrix(bikesJan.treat), label = bikesJan$cnt,
              objective = "reg:squarederror",
              nrounds = 100, nfold = 5, eta = 0.3, max_depth = 6)

Key inputs to xgb.cv() and xgboost()

  • data: input data as matrix ; label: outcome
  • objective: for regression - "reg:squarederror"
  • nrounds: maximum number of trees to fit
  • eta: learning rate
  • max_depth: maximum depth of individual trees
  • nfold (xgb.cv() only): number of folds for cross validation
Supervised Learning in R: Regression

Find the Right Number of Trees

elog <- as.data.frame(cv$evaluation_log)
(nrounds <- which.min(elog$test_rmse_mean))
78
Supervised Learning in R: Regression

Run xgboost() for final model

nrounds <- 78

model <- xgboost(data = as.matrix(bikesJan.treat), 
                 label = bikesJan$cnt,
                 nrounds = nrounds,
                 objective = "reg:squarederror",
                 eta = 0.3,
                 max_depth = 6)
Supervised Learning in R: Regression

Predict with an xgboost() model

Prepare February data, and predict

bikesFeb.treat <- prepare(treatplan, bikesFeb, varRestriction = newvars)

bikesFeb$pred <- predict(model, as.matrix(bikesFeb.treat))

Model performances on Febrary Data

Model RMSE
Quasipoisson 69.3
Random forests 67.15
Gradient Boosting 54.0
Supervised Learning in R: Regression

Visualize the Results

Predictions vs. Actual Bike Rentals, February

Predictions and Hourly Bike Rentals, February

Supervised Learning in R: Regression

Let's practice!

Supervised Learning in R: Regression

Preparing Video For Download...