Gradient boosting

Machine Learning with Tree-Based Models in R

Sandro Raabe

Data Scientist

Recap: boosting

  • Uses weak learners (e.g. decision trees with only one split) which perform slightly better than random chance
  • Adds up these weak learners and filters out correct predictions
  • Handles remaining difficult observations at each step

 

  • AdaBoost: first popular boosting algorithm
  • Gradient Boosting: improvement of AdaBoost
Machine Learning with Tree-Based Models in R

Comparison

Adaboost
  • Uses decision stumps as weak learners
  • Attaches weights to observations:
    • High weight for difficult observations
    • Low weight for correct predictions
Gradient boosting
  • Uses small decision trees as weak learners
  • Loss function instead of weights
  • Loss function optimization by gradient descent
Machine Learning with Tree-Based Models in R

Pros & cons of boosting

 

Advantages

  • Among the best-performing machine learning models
  • Good option for unbalanced data

 

Disadvantages

  • Prone to overfitting
  • Training can be slow (depending on learning rate hyperparameter)
  • Many tuning hyperparameters
Machine Learning with Tree-Based Models in R

Hyperparameters for gradient boosting

Known from simple decision trees
  • min_n: minimum number of data points in a node that is required to be split further
  • tree_depth: maximum depth of the tree / number of splits
Known from random forests and bagged trees:
  • sample_size: amount of data exposed to the fitting routine
  • trees: number of trees in the ensemble
Machine Learning with Tree-Based Models in R

Hyperparameters for gradient boosting

Known from random forests:
  • mtry: number of predictors randomly sampled at each split
Special for boosted trees:
  • learn_rate: rate at which the boosting algorithm adapts from iteration to iteration
  • loss_reduction: reduction in the loss function required to split further
  • stop_iter: The number of iterations without improvement before stopping
Machine Learning with Tree-Based Models in R

Let's practice!

Machine Learning with Tree-Based Models in R

Preparing Video For Download...