Poisson and quasipoisson regression to predict counts

Supervised Learning in R: Regression

Nina Zumel and John Mount

Win-Vector, LLC

Predicting Counts

  • Linear regression: predicts values in $[-\infty, \infty]$
  • Counts: integers in range $[0,\infty]$
Supervised Learning in R: Regression

Poisson/Quasipoisson Regression

glm(formula, data, family)
  • family: either poisson or quasipoisson
  • inputs additive and linear in log(count)
Supervised Learning in R: Regression

Poisson/Quasipoisson Regression

glm(formula, data, family)
  • family: either poisson or quasipoisson
  • inputs additive and linear in log(count)
  • outcome: integer
    • counts: e.g. number of traffic tickets a driver gets
    • rates: e.g. number of website hits/day
  • prediction: expected rate or intensity (not integral)
    • expected # traffic tickets; expected hits/day
Supervised Learning in R: Regression

Poisson vs. Quasipoisson

  • Poisson assumes that mean(y) = var(y)
  • If var(y) much different from mean(y) - quasipoisson
  • Generally requires a large sample size
  • If rates/counts >> 0 - regular regression is fine
Supervised Learning in R: Regression

Example: Predicting Bike Rentals

Supervised Learning in R: Regression

Fit the model

bikesJan %>% 
  summarize(mean = mean(cnt), var = var(cnt))
      mean      var
1 130.5587 14351.25

Since var(cnt) >> mean(cnt) $\rightarrow$ use quasipoisson

fmla <- cnt ~ hr + holiday + workingday + 
  weathersit + temp + atemp + hum + windspeed

model <- glm(fmla, data = bikesJan, family = quasipoisson)
Supervised Learning in R: Regression

Check model fit

$$ pseudo R^2 = 1 - \frac{deviance}{null.deviance} $$

glance(model) %>%
  summarize(pseudoR2 = 1 - deviance/null.deviance)
   pseudoR2
1 0.7654358
Supervised Learning in R: Regression

Predicting from the model

predict(model, newdata = bikesFeb, type = "response")

Supervised Learning in R: Regression

Evaluate the model

You can evaluate count models by RMSE

bikesFeb %>%
  mutate(residual = cnt - pred) %>%
  summarize(rmse = sqrt(mean(residual^2))) 
      rmse
1 69.32869
sd(bikesFeb$cnt)
134.2865
Supervised Learning in R: Regression

Compare Predictions and Actual Outcomes

Supervised Learning in R: Regression

Let's practice!

Supervised Learning in R: Regression

Preparing Video For Download...