Supervised Learning in R: Regression
Nina Zumel and John Mount
Win-Vector, LLC
glm(formula, data, family)
poisson or quasipoissonglm(formula, data, family)
poisson or quasipoissonmean(y) = var(y)var(y) much different from mean(y) - quasipoisson
bikesJan %>% 
  summarize(mean = mean(cnt), var = var(cnt))
      mean      var
1 130.5587 14351.25
Since var(cnt) >> mean(cnt) $\rightarrow$ use quasipoisson
fmla <- cnt ~ hr + holiday + workingday + 
  weathersit + temp + atemp + hum + windspeed
model <- glm(fmla, data = bikesJan, family = quasipoisson)
$$ pseudo R^2 = 1 - \frac{deviance}{null.deviance} $$
glance(model) %>%
  summarize(pseudoR2 = 1 - deviance/null.deviance)
   pseudoR2
1 0.7654358
predict(model, newdata = bikesFeb, type = "response")

You can evaluate count models by RMSE
bikesFeb %>%
  mutate(residual = cnt - pred) %>%
  summarize(rmse = sqrt(mean(residual^2))) 
      rmse
1 69.32869
sd(bikesFeb$cnt)
134.2865

Supervised Learning in R: Regression