Supervised Learning in R: Regression
Nina Zumel and John Mount
Win-Vector, LLC
For a Normal Distribution:
model <- lm(log(y) ~ x, data = train)
model <- lm(log(y) ~ x, data = train)
logpred <- predict(model, data = test)
model <- lm(log(y) ~ x, data = train)
logpred <- predict(model, data = test)
pred <- exp(logpred)
$log(a) + log(b) = log(ab)$
$log(a) - log(b) = log(a/b)$
Reducing multiplicative error reduces relative error.
RMS-relative error = $\sqrt{ \overline{ (\frac{pred-y}{y})^2 }}$
modIncome <- lm(Income ~ AFQT + Educ, data = train)
AFQT
: Score on proficiency test 25 years before surveyEduc
: Years of education to time of surveyIncome
: Income at time of surveytest %>%
+ mutate(pred = predict(modIncome, newdata = test),
+ err = pred - Income) %>%
+ summarize(rmse = sqrt(mean(err^2)),
+ rms.relerr = sqrt(mean((err/Income)^2)))
RMSE | RMS-relative error |
---|---|
36,819.39 | 3.295189 |
modLogIncome <- lm(log(Income) ~ AFQT + Educ, data = train)
test %>%
+ mutate(predlog = predict(modLogIncome, newdata = test),
+ pred = exp(predlog),
+ err = pred - Income) %>%
+ summarize(rmse = sqrt(mean(err^2)),
+ rms.relerr = sqrt(mean((err/Income)^2)))
RMSE | RMS-relative error |
---|---|
38,906.61 | 2.276865 |
log(Income)
model: smaller RMS-relative error, larger RMSE
Model | RMSE | RMS-relative error |
---|---|---|
On Income |
36,819.39 | 3.295189 |
On log(Income) |
38,906.61 | 2.276865 |
Supervised Learning in R: Regression