Tidying models with broom

Case Study: Exploratory Data Analysis in R

Dave Robinson

Chief Data Scientist, DataCamp

A model fit is a “messy” object

summary(model)
Call:
lm(formula = percent_yes ~ year, data = afghanistan)
Residuals:
      Min        1Q    Median        3Q       Max 
-0.254667 -0.038650 -0.001945  0.057110  0.140596 
Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept) -1.106e+01  1.471e+00  -7.523 1.44e-08 ***
year         6.009e-03  7.426e-04   8.092 3.06e-09 ***
<hr />
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.08497 on 32 degrees of freedom
Multiple R-squared:  0.6717,\tAdjusted R-squared:  0.6615 
F-statistic: 65.48 on 1 and 32 DF,  p-value: 3.065e-09
Case Study: Exploratory Data Analysis in R

Models are difficult to combine

model1 <- lm(percent_yes ~ year, data = afghanistan)
model2 <- lm(percent_yes ~ year, data = united_states)
model3 <- lm(percent_yes ~ year, data = canada)
Case Study: Exploratory Data Analysis in R

broom turns a model into a data frame

library(broom)
tidy(model)
         term      estimate    std.error statistic      p.value
1 (Intercept) -11.063084650 1.4705189228 -7.523252 1.444892e-08
2        year   0.006009299 0.0007426499  8.091698 3.064797e-09
Case Study: Exploratory Data Analysis in R

Tidy models can be combined

model1 <- lm(percent_yes ~ year, data = afghanistan)
model2 <- lm(percent_yes ~ year, data = united_states)

tidy(model1)
         term      estimate    std.error statistic      p.value
1 (Intercept) -11.063084650 1.4705189228 -7.523252 1.444892e-08
2        year   0.006009299 0.0007426499  8.091698 3.064797e-09
tidy(model2)
         term     estimate    std.error statistic      p.value
1 (Intercept) 12.664145512 1.8379742715  6.890274 8.477089e-08
2        year -0.006239305 0.0009282243 -6.721764 1.366904e-07
> bind_rows(tidy(model1), tidy(model2))
         term      estimate    std.error statistic      p.value
1 (Intercept) -11.063084650 1.4705189228 -7.523252 1.444892e-08
2        year   0.006009299 0.0007426499  8.091698 3.064797e-09
3 (Intercept)  12.664145512 1.8379742715  6.890274 8.477089e-08
4        year  -0.006239305 0.0009282243 -6.721764 1.366904e-07
Case Study: Exploratory Data Analysis in R

Let's practice!

Case Study: Exploratory Data Analysis in R

Preparing Video For Download...