Supervised Learning in R: Regression
Nina Zumel and John Mount
Win-Vector, LLC
$$ y \sim b0 + s1(x1) + s2(x2) + .... $$
gam(formula, family, data)
family:
Best for larger datasets
anx ~ s(hassles)
s()
designates that variable should be non-linears()
with continuous variablesModel | RMSE (cross-val) | $R^2$ (training) |
---|---|---|
Linear ($hassles$) | 7.69 | 0.53 |
Quadratic ($hassles^2$) | 6.89 | 0.63 |
Cubic ($hassles^3$) | 6.70 | 0.65 |
model <- gam(
anx ~ s(hassles),
data = hassleframe,
family = gaussian
)
summary(model)
...
R-sq.(adj) = 0.619 Deviance explained = 64.1%
GCV = 49.132 Scale est. = 45.153 n = 40
plot(model)
$y$ values: predict(model, type = "terms")
predict(model, newdata = hassleframe, type = "response")
Knowing the correct transformation is best, but GAM is useful when transformation isn't known
Model | RMSE (cross-val) | $R^2$ (training) |
---|---|---|
Linear ($hassles$) | 7.69 | 0.53 |
Quadratic ($hassles^2$) | 6.89 | 0.63 |
Cubic ($hassles^3$) | 6.70 | 0.65 |
GAM | 7.06 | 0.64 |
Supervised Learning in R: Regression