Machine Learning with Tree-Based Models in R
Sandro Raabe
Data Scientist
head(chocolate, 5)
final_grade review_date cocoa_percent company_location bean_type broad_bean_origin
<dbl> <int> <dbl> <fct> <fct> <fct>
3 2009 0.8 U.K. "Criollo, Trinitario" "Madagascar"
3.75 2012 0.7 Guatemala "Trinitario" "Madagascar"
2.75 2009 0.75 Colombia "Forastero (Nacional)" "Colombia"
3.5 2014 0.74 Zealand "" "Papua New Guinea"
3.75 2011 0.72 Australia "" "Bolivia"
spec <- decision_tree() %>%
set_mode("regression") %>%
set_engine("rpart")
print(spec)
Decision Tree Model Specification
(regression)
Computational engine: rpart
model <- spec %>% fit(formula = final_grade ~ .,
data = chocolate_train)
print(model)
parsnip model object
Fit time: 20ms
n= 1437
node), split, n, deviance, yval
* denotes terminal node
# Model predictions on new data
predict(model, new_data = chocolate_test)
.pred
<dbl>
3.281915
3.435234
3.281915
3.833931
3.281915
3.514151
3.273864
3.514151
min_n
: number of data points in a node needed for further split (default: 20)tree_depth
: maximum depth of a tree (default: 30)cost_complexity
: penalty for complexity (default: 0.01)decision_tree(tree_depth = 4, cost_complexity = 0.05) %>%
set_mode("regression")
decision_tree(tree_depth = 1) %>%
set_mode("regression") %>%
set_engine("rpart") %>%
fit(formula = final_grade ~ .,
data = chocolate_train)
parsnip model object
Fit time: 1ms
n= 1000
node), split, n, yval
1) root 1000 2.347450
2) cocoa_percent>=0.905 16 2.171875 *
3) cocoa_percent<0.905 984 3.190803 *
tree_depth = 1
Machine Learning with Tree-Based Models in R