Generalized Linear Models in R
Richard Erickson
Instructor
Problem: Multiple predictor variables. Which one should I include?
Solution: Include all of them using multiple regression.
Theoretical maximum number of coefficients:
Maximum number of $\beta$s = Number of observations
Over-fitting:
Using too many predictors compared to number of samples
Practical maximum number of coefficients:
Number of $\beta\times10 \approx$ Number of observations
CommuteDay
MilesOneWay
glm(Bus ~ CommuteDay + MilesOneWay, data = bus, family = 'binomial')
Call:
glm(formula = Bus ~ CommuteDays + MilesOneWay, family = "binomial",
data = bus)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.0732 -0.9035 -0.7816 1.3968 2.5066
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.707515 0.119719 -5.910 3.42e-09 ***
CommuteDays 0.066084 0.023181 2.851 0.00436 **
MilesOneWay -0.059571 0.003218 -18.512 < 2e-16 ***
#...
No correlation between predictors
Correlation between predictors
Generalized Linear Models in R