Supervised Learning in R: Regression
Nina Zumel and John Mount
Win-Vector, LCC
Multiple diverse decision trees averaged together
cnt ~ hr + holiday + workingday +
weathersit + temp + atemp + hum + windspeed
model <- ranger(fmla, bikesJan,
num.trees = 500,
respect.unordered.factors = "order")
formula
, data
num.trees
(default 500) - use at least 200mtry
- number of variables to try at each noderespect.unordered.factors
- recommend set to "order"model
Ranger result
...
OOB prediction error (MSE): 3103.623
R squared (OOB): 0.7837386
Random forest algorithm returns estimates of out-of-sample performance.
bikesFeb$pred <- predict(model, bikesFeb)$predictions
predict()
inputs:
Predictions can be accessed in the element predictions
.
Calculate RMSE:
bikesFeb %>%
mutate(residual = cnt - pred) %>%
summarize(rmse = sqrt(mean(residual^2)))
rmse
1 67.15169
Model | RMSE |
---|---|
Quasipoisson | 69.3 |
Random forests | 67.15 |
Supervised Learning in R: Regression