Machine Learning in the Tidyverse
Dmitriy (Dima) Gorenshteyn
Lead Data Scientist, Memorial Sloan Kettering Cancer Center
tidy(): returns the statistical findings of the model (such as coefficients)
glance(): returns a concise one-row summary of the model
augment(): adds prediction columns to the data being modeled
library(broom)
tidy(algeria_model)
term estimate std.error statistic p.value
1 (Intercept) -1196.5647772 39.93891866 -29.95987 1.319126e-33
2 year 0.6348625 0.02011472 31.56209 1.108517e-34
glance(algeria_model)
r.squared adj.r.squared sigma statistic p.value df
0.9522064 0.9512505 2.176948 996.1653 1.108517e-34 2
logLik AIC BIC deviance df.residual
-113.2171 232.4342 238.288 236.9552 50
augment(algeria_model)
life_expectancy year .fitted .se.fit .resid .hat .sigma
1 47.50 1960 47.76581 0.5951714 -0.2658128 0.07474601 2.198695
2 48.02 1961 48.40068 0.5779264 -0.3806753 0.07047725 2.198326
3 48.55 1962 49.03554 0.5608726 -0.4855379 0.06637924 2.197878
4 49.07 1963 49.67040 0.5440279 -0.6004004 0.06245198 2.197265
5 49.58 1964 50.30526 0.5274124 -0.7252630 0.05869547 2.196455
6 50.09 1965 50.94013 0.5110485 -0.8501255 0.05510971 2.195498
augment(algeria_model) %>%
ggplot(mapping = aes(x = year)) +
geom_point(mapping = aes(y = life_expectancy)) +
geom_line(mapping = aes(y = .fitted), color = "red")
Machine Learning in the Tidyverse