Introduction to Regression in R
Richie Cotton
Data Evangelist at DataCamp
mdl_mass_vs_length <- lm(mass_g ~ length_cm, data = bream)
Call:
lm(formula = mass_g ~ length_cm, data = bream)
Coefficients:
(Intercept) length_cm
-1035.35 54.55
coefficients(mdl_mass_vs_length)
(Intercept) length_cm
-1035.34757 54.54998
fitted values: predictions on the original dataset
fitted(mdl_mass_vs_length)
or equivalently
explanatory_data <- bream %>%
select(length_cm)
predict(mdl_mass_vs_length, explanatory_data)
1 2 3 4 5
230.2120 273.8520 268.3970 399.3169 410.2269
6 7 8 9 10
426.5919 426.5919 470.2319 470.2319 519.3269
11 12 13 14 15
513.8719 530.2369 552.0569 573.8769 568.4219
16 17 18 19 20
568.4219 622.9719 622.9719 650.2468 655.7018
21 22 23 24 25
672.0668 677.5218 682.9768 699.3418 704.7968
26 27 28 29 30
699.3418 710.2518 748.4368 753.8918 792.0768
31 32 33 34 35
873.9018 873.9018 939.3617 1004.8217 1037.5517
Residuals: actual response values minus predicted response values
residuals(mdl_mass_vs_length)
or equivalently
bream$mass_g - fitted(mdl_mass_vs_length)
1 2 3 4 5
11.788 16.148 71.603 -36.317 19.773
6 7 8 9 10
23.408 73.408 -80.232 -20.232 -19.327
11 12 13 14 15
-38.872 -30.237 -52.057 -233.877 31.578
16 17 18 19 20
31.578 77.028 77.028 -40.247 -5.702
21 22 23 24 25
-97.067 7.478 -62.977 -19.342 -4.797
26 27 28 29 30
25.658 9.748 -34.437 96.108 207.923
31 32 33 34 35
46.098 81.098 -14.362 -29.822 -87.552
summary(mdl_mass_vs_length)
Call:
lm(formula = mass_g ~ length_cm, data = bream)
Residuals:
Min 1Q Median 3Q Max
-233.9 -35.4 -4.8 31.6 207.9
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1035.35 107.97 -9.59 4.6e-11 ***
length_cm 54.55 3.54 15.42 < 2e-16 ***
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 74.2 on 33 degrees of freedom
Multiple R-squared: 0.878, Adjusted R-squared: 0.874
F-statistic: 238 on 1 and 33 DF, p-value: <2e-16
Call:
lm(formula = mass_g ~ length_cm, data = bream)
Residuals:
Min 1Q Median 3Q Max
-233.9 -35.4 -4.8 31.6 207.9
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1035.35 107.97 -9.59 4.6e-11 ***
length_cm 54.55 3.54 15.42 < 2e-16 ***
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 74.2 on 33 degrees of freedom
Multiple R-squared: 0.878, Adjusted R-squared: 0.874
F-statistic: 238 on 1 and 33 DF, p-value: <2e-16
library(broom)
tidy(mdl_mass_vs_length)
# A tibble: 2 x 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) -1035. 108. -9.59 4.58e-11
2 length_cm 54.5 3.54 15.4 1.22e-16
augment(mdl_mass_vs_length)
# A tibble: 35 × 8
mass_g length_cm .fitted .resid .hat .sigma .cooksd .std.resid
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 242 23.2 230. 11.8 0.144 75.3 0.00247 0.172
2 290 24 274. 16.1 0.119 75.2 0.00364 0.232
3 340 23.9 268. 71.6 0.122 74.1 0.0738 1.03
4 363 26.3 399. -36.3 0.0651 75.0 0.00894 -0.507
5 430 26.5 410. 19.8 0.0616 75.2 0.00248 0.275
6 450 26.8 427. 23.4 0.0566 75.2 0.00317 0.325
7 500 26.8 427. 73.4 0.0566 74.1 0.0311 1.02
8 390 27.6 470. -80.2 0.0452 73.9 0.0291 -1.11
9 450 27.6 470. -20.2 0.0452 75.2 0.00185 -0.279
10 500 28.5 519. -19.3 0.0360 75.2 0.00132 -0.265
# ... with 25 more rows
glance(mdl_mass_vs_length)
# A tibble: 1 × 12
r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 0.878 0.874 74.2 238. 1.22e-16 1 -199. 405. 409.
# ... with 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>
Introduction to Regression in R