Supervised Learning in R: Regression
Nina Zumel and John Mount
Win-Vector LLC
Which is best?
anx ~ I(hassles^2)
anx ~ I(hassles^3)
anx ~ I(hassles^2) + I(hassles^3)
anx ~ exp(hassles)
I()
: treat an expression literally (not as an interaction)
Linear, Quadratic, and Cubic models
mod_lin <- lm(anx ~ hassles, hassleframe)
summary(mod_lin)$r.squared
0.5334847
mod_quad <- lm(anx ~ I(hassles^2), hassleframe)
summary(mod_quad)$r.squared
0.6241029
mod_tritic <- lm(anx ~ I(hassles^3), hassleframe)
summary(mod_tritic)$r.squared
0.6474421
Use cross-validation to evaluate the models
Model | RMSE |
---|---|
Linear ($hassles$) | 7.69 |
Quadratic ($hassles^2$) | 6.89 |
Cubic ($hassles^3$) | 6.70 |
Supervised Learning in R: Regression