Continuous outcomes

Machine Learning with Tree-Based Models in R

Sandro Raabe

Data Scientist

The dataset

head(chocolate, 5)
final_grade review_date   cocoa_percent  company_location  bean_type              broad_bean_origin
<dbl>       <int>         <dbl>          <fct>             <fct>                  <fct>
3           2009          0.8            U.K.              "Criollo, Trinitario"  "Madagascar"
3.75        2012          0.7            Guatemala         "Trinitario"           "Madagascar"
2.75        2009          0.75           Colombia          "Forastero (Nacional)" "Colombia"
3.5         2014          0.74           Zealand           ""                     "Papua New Guinea"
3.75        2011          0.72           Australia         ""                     "Bolivia"
Machine Learning with Tree-Based Models in R

Construct the regression tree

spec <- decision_tree() %>%

set_mode("regression") %>%
set_engine("rpart")
print(spec)
Decision Tree Model Specification
(regression)

Computational engine: rpart
model <- spec %>%
  fit(formula = final_grade ~ .,

data = chocolate_train)
print(model)
parsnip model object

Fit time:  20ms 
n= 1437 

node), split, n, deviance, yval
      * denotes terminal node
Machine Learning with Tree-Based Models in R

Predictions using a regression tree

# Model predictions on new data
predict(model, new_data = chocolate_test)
.pred
<dbl>
3.281915
3.435234
3.281915
3.833931
3.281915
3.514151
3.273864
3.514151
Machine Learning with Tree-Based Models in R

Divide & conquer

divide_and_conquer

Machine Learning with Tree-Based Models in R

Hyperparameters

Goal for regression trees:
  • Low variance or deviation from the mean within groups
Design decisions:
  • min_n: number of data points in a node needed for further split (default: 20)
  • tree_depth: maximum depth of a tree (default: 30)
  • cost_complexity: penalty for complexity (default: 0.01)
Set them in very first step:
decision_tree(tree_depth = 4, cost_complexity = 0.05) %>% 
    set_mode("regression")
Machine Learning with Tree-Based Models in R

Understanding model output

decision_tree(tree_depth = 1) %>%
  set_mode("regression") %>%              
  set_engine("rpart")  %>%
  fit(formula = final_grade ~ .,
      data = chocolate_train)
parsnip model object

Fit time:  1ms
n= 1000

node), split, n, yval

1) root                 1000  2.347450
2) cocoa_percent>=0.905   16  2.171875 *
3) cocoa_percent<0.905   984  3.190803 *
  • Model with tree_depth = 1

 

 

  • Visualization:

decision tree

Machine Learning with Tree-Based Models in R

Let's do regression!

Machine Learning with Tree-Based Models in R

Preparing Video For Download...