Supervised Learning in R: Regression
Nina Zumel and John Mount
Win-Vector, LLC
$$ y \sim b0 + s1(x1) + s2(x2) + .... $$

gam(formula, family, data)
family:
Best for larger datasets
anx ~ s(hassles)
s() designates that variable should be non-linears() with continuous variables
| Model | RMSE (cross-val) | $R^2$ (training) | 
|---|---|---|
| Linear ($hassles$) | 7.69 | 0.53 | 
| Quadratic ($hassles^2$) | 6.89 | 0.63 | 
| Cubic ($hassles^3$) | 6.70 | 0.65 | 
model <- gam(
  anx ~ s(hassles), 
  data = hassleframe, 
  family = gaussian
)
summary(model)
...
R-sq.(adj) =  0.619   Deviance explained = 64.1%
GCV = 49.132  Scale est. = 45.153    n = 40
plot(model)

$y$ values: predict(model, type = "terms")
predict(model, newdata = hassleframe, type = "response")

Knowing the correct transformation is best, but GAM is useful when transformation isn't known
| Model | RMSE (cross-val) | $R^2$ (training) | 
|---|---|---|
| Linear ($hassles$) | 7.69 | 0.53 | 
| Quadratic ($hassles^2$) | 6.89 | 0.63 | 
| Cubic ($hassles^3$) | 6.70 | 0.65 | 
| GAM | 7.06 | 0.64 | 
Supervised Learning in R: Regression