Supervised Learning in R: Regression
Nina Zumel and John Mount
Win-Vector, LLC



For a Normal Distribution:
 model <- lm(log(y) ~ x, data = train)
 model <- lm(log(y) ~ x, data = train)
 logpred <- predict(model, data = test)
 model <- lm(log(y) ~ x, data = train)
 logpred <- predict(model, data = test)
 pred <- exp(logpred)
$log(a) + log(b) = log(ab)$
$log(a) - log(b) = log(a/b)$
Reducing multiplicative error reduces relative error.
RMS-relative error = $\sqrt{ \overline{ (\frac{pred-y}{y})^2 }}$
modIncome <- lm(Income ~ AFQT + Educ, data = train)
AFQT: Score on proficiency test 25 years before surveyEduc: Years of education to time of surveyIncome: Income at time of surveytest %>% 
+     mutate(pred = predict(modIncome, newdata = test),
+            err = pred - Income) %>%
+     summarize(rmse = sqrt(mean(err^2)),
+               rms.relerr = sqrt(mean((err/Income)^2))) 
| RMSE | RMS-relative error | 
|---|---|
| 36,819.39 | 3.295189 | 
modLogIncome <- lm(log(Income) ~ AFQT + Educ, data = train)
test %>% 
+     mutate(predlog = predict(modLogIncome, newdata = test), 
+            pred = exp(predlog), 
+            err = pred - Income) %>%
+     summarize(rmse = sqrt(mean(err^2)),
+               rms.relerr = sqrt(mean((err/Income)^2)))
| RMSE | RMS-relative error | 
|---|---|
| 38,906.61 | 2.276865 | 
log(Income) model: smaller RMS-relative error, larger RMSE
| Model | RMSE | RMS-relative error | 
|---|---|---|
| On Income | 36,819.39 | 3.295189 | 
| On log(Income) | 38,906.61 | 2.276865 | 
Supervised Learning in R: Regression