Supervised Learning in R: Regression
Nina Zumel and John Mount
Win-Vector, LLC
glm(formula, data, family)
poisson
or quasipoisson
glm(formula, data, family)
poisson
or quasipoisson
mean(y) = var(y)
var(y)
much different from mean(y)
- quasipoissonbikesJan %>%
summarize(mean = mean(cnt), var = var(cnt))
mean var
1 130.5587 14351.25
Since var(cnt)
>> mean(cnt)
$\rightarrow$ use quasipoisson
fmla <- cnt ~ hr + holiday + workingday +
weathersit + temp + atemp + hum + windspeed
model <- glm(fmla, data = bikesJan, family = quasipoisson)
$$ pseudo R^2 = 1 - \frac{deviance}{null.deviance} $$
glance(model) %>%
summarize(pseudoR2 = 1 - deviance/null.deviance)
pseudoR2
1 0.7654358
predict(model, newdata = bikesFeb, type = "response")
You can evaluate count models by RMSE
bikesFeb %>%
mutate(residual = cnt - pred) %>%
summarize(rmse = sqrt(mean(residual^2)))
rmse
1 69.32869
sd(bikesFeb$cnt)
134.2865
Supervised Learning in R: Regression