Welcome to the course!

Machine Learning with Tree-Based Models in R

Sandro Raabe

Data Scientist

Course overview

 

  • Chapter 1: Classification trees
  • Chapter 2: Regression trees, cross-validation, bias-variance tradeoff
  • Chapter 3: Hyperparameter tuning, bagging, random forests
  • Chapter 4: Boosted trees
Machine Learning with Tree-Based Models in R

Decision trees are flowcharts

animal_flowchart

1 https://aca.edu.au/resources/decision-trees-classifying-animals/decision-trees.pdf
Machine Learning with Tree-Based Models in R

Advantages of tree-based models

  • Easy to explain and understand
  • Possible to capture non-linear relationships
  • Require no normalization or standardization of numeric features
  • No need to create dummy indicator variables
  • Robust to outliers
  • Fast for large datasets
Machine Learning with Tree-Based Models in R

Disadvantages of tree-based models

  • Hard to interpret if large, deep, or ensembled
  • High variance, complex trees are prone to overfitting
Machine Learning with Tree-Based Models in R

tidymodels_screenshot

Machine Learning with Tree-Based Models in R

The tidymodels package

library(tidymodels)
-- Attaching packages -------------------- tidymodels 0.1.4 --
v parsnip   0.2.1      v rsample   0.1.1 
v dplyr     1.0.9      v tibble    3.1.7 
v yardstick 0.0.9      v tune      0.1.6
Machine Learning with Tree-Based Models in R

Create a decision tree

Specification / functional design

 1. Pick a model class

library(tidymodels)

decision_tree()
Decision Tree Model Specification (unknown)
Machine Learning with Tree-Based Models in R

Create a decision tree

 2. Set the engine that powers your model

library(tidymodels)

decision_tree() %>% 
    set_engine("rpart")
Decision Tree Model Specification (unknown)

Computational engine: rpart
Machine Learning with Tree-Based Models in R

Create a decision tree

 3. Set the mode (classification or regression)

library(tidymodels)

decision_tree() %>% 
     set_engine("rpart") %>% 
     set_mode("classification")
Decision Tree Model Specification (classification)

Computational engine: rpart
Machine Learning with Tree-Based Models in R

From a model specification to a real model

Specification is a skeleton and needs data to be trained with
library(tidymodels)
tree_spec <- decision_tree() %>% 
               set_engine("rpart") %>% 
               set_mode("classification")
# A model specification is fit using a formula to training data
tree_spec %>%           
  fit(formula = outcome ~ age + bmi,  
      data = diabetes)
parsnip model object
Fit time: 19 ms 
n = 652
Machine Learning with Tree-Based Models in R

Let's build a model!

Machine Learning with Tree-Based Models in R

Preparing Video For Download...