Machine Learning met boomgebaseerde modellen in R
Sandro Raabe
Data Scientist
head(chocolate, 5)
final_grade review_date cocoa_percent company_location bean_type broad_bean_origin
<dbl> <int> <dbl> <fct> <fct> <fct>
3 2009 0.8 U.K. "Criollo, Trinitario" "Madagascar"
3.75 2012 0.7 Guatemala "Trinitario" "Madagascar"
2.75 2009 0.75 Colombia "Forastero (Nacional)" "Colombia"
3.5 2014 0.74 Zealand "" "Papua New Guinea"
3.75 2011 0.72 Australia "" "Bolivia"
spec <- decision_tree() %>%set_mode("regression") %>%set_engine("rpart")print(spec)
Decision Tree Model Specification
(regression)
Computational engine: rpart
model <- spec %>% fit(formula = final_grade ~ .,data = chocolate_train)print(model)
parsnip model object
Fit time: 20ms
n= 1437
node), split, n, deviance, yval
* denotes terminal node
# Voorspellingen op nieuwe data
predict(model, new_data = chocolate_test)
.pred
<dbl>
3.281915
3.435234
3.281915
3.833931
3.281915
3.514151
3.273864
3.514151

min_n: minimaal aantal punten in een node voor een extra split (standaard: 20)tree_depth: maximale boomdiepte (standaard: 30)cost_complexity: straf voor complexiteit (standaard: 0.01)decision_tree(tree_depth = 4, cost_complexity = 0.05) %>%
set_mode("regression")
decision_tree(tree_depth = 1) %>%
set_mode("regression") %>%
set_engine("rpart") %>%
fit(formula = final_grade ~ .,
data = chocolate_train)
parsnip model object
Fit time: 1ms
n= 1000
node), split, n, yval
1) root 1000 2.347450
2) cocoa_percent>=0.905 16 2.171875 *
3) cocoa_percent<0.905 984 3.190803 *
tree_depth = 1

Machine Learning met boomgebaseerde modellen in R