Introduction to Regression in R
Richie Cotton
Data Evangelist at DataCamp
bream <- fish %>%
filter(species == "Bream")
species | length_cm | mass_g |
---|---|---|
Bream | 23.2 | 242 |
Bream | 24.0 | 290 |
Bream | 23.9 | 340 |
Bream | 26.3 | 363 |
Bream | 26.5 | 430 |
... | ... | ... |
ggplot(bream, aes(length_cm, mass_g)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE)
mdl_mass_vs_length <- lm(mass_g ~ length_cm, data = bream)
Call:
lm(formula = mass_g ~ length_cm, data = bream)
Coefficients:
(Intercept) length_cm
-1035.35 54.55
If I set the explanatory variables to these values,
what value would the response variable have?
library(dplyr)
explanatory_data <- tibble(length_cm = 20:40)
library(tibble)
explanatory_data <- tibble(length_cm = 20:40)
predict(mdl_mass_vs_length, explanatory_data)
1 2 3 4 5 6
55.65205 110.20203 164.75202 219.30200 273.85198 328.40196
7 8 9 10 11 12
382.95194 437.50192 492.05190 546.60188 601.15186 655.70184
13 14 15 16 17 18
710.25182 764.80181 819.35179 873.90177 928.45175 983.00173
19 20 21
1037.55171 1092.10169 1146.65167
library(dplyr)
explanatory_data <- tibble(length_cm = 20:40)
prediction_data <- explanatory_data %>%
mutate(
mass_g = predict(
mdl_mass_vs_length, explanatory_data
)
)
# A tibble: 21 x 2
length_cm mass_g
<int> <dbl>
1 20 55.7
2 21 110.
3 22 165.
4 23 219.
5 24 274.
6 25 328.
7 26 383.
8 27 438.
9 28 492.
10 29 547.
# ... with 11 more rows
ggplot(bream, aes(length_cm, mass_g)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE) +
geom_point(
data = prediction_data,
color = "blue"
)
Extrapolating means making predictions outside the range of observed data.
explanatory_little_bream <- tibble(length_cm = 10)
explanatory_little_bream %>%
mutate(
mass_g = predict(
mdl_mass_vs_length, explanatory_little_bream
)
)
# A tibble: 1 x 2
length_cm mass_g
<dbl> <dbl>
1 10 -490.
Introduction to Regression in R