Making predictions

Introduction to Regression in R

Richie Cotton

Data Evangelist at DataCamp

The fish dataset: bream

bream <- fish %>% 
  filter(species == "Bream")
species length_cm mass_g
Bream 23.2 242
Bream 24.0 290
Bream 23.9 340
Bream 26.3 363
Bream 26.5 430
... ... ...
Introduction to Regression in R

Plotting mass vs. length

ggplot(bream, aes(length_cm, mass_g)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE)

A scatter plot of bream masses versus their lengths, with a linear trend line. The points all lie close to the trend line.

Introduction to Regression in R

Running the model

mdl_mass_vs_length <- lm(mass_g ~ length_cm, data = bream)
Call:
lm(formula = mass_g ~ length_cm, data = bream)

Coefficients:
(Intercept)    length_cm  
   -1035.35        54.55 
Introduction to Regression in R

Data on explanatory values to predict

If I set the explanatory variables to these values,
what value would the response variable have?

library(dplyr)
explanatory_data <- tibble(length_cm = 20:40)
Introduction to Regression in R

Call predict()

library(tibble)
explanatory_data <- tibble(length_cm = 20:40)
predict(mdl_mass_vs_length, explanatory_data)
         1          2          3          4          5          6 
  55.65205  110.20203  164.75202  219.30200  273.85198  328.40196 
         7          8          9         10         11         12 
 382.95194  437.50192  492.05190  546.60188  601.15186  655.70184 
        13         14         15         16         17         18 
 710.25182  764.80181  819.35179  873.90177  928.45175  983.00173 
        19         20         21 
1037.55171 1092.10169 1146.65167 
Introduction to Regression in R

Predicting inside a data frame

library(dplyr)
explanatory_data <- tibble(length_cm = 20:40)
prediction_data <- explanatory_data %>% 
  mutate(
    mass_g = predict(
      mdl_mass_vs_length, explanatory_data
    )
  )
# A tibble: 21 x 2
   length_cm mass_g
       <int>  <dbl>
 1        20   55.7
 2        21  110. 
 3        22  165. 
 4        23  219. 
 5        24  274. 
 6        25  328. 
 7        26  383. 
 8        27  438. 
 9        28  492. 
10        29  547. 
# ... with 11 more rows
Introduction to Regression in R

Showing predictions

ggplot(bream, aes(length_cm, mass_g)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE) +
  geom_point(
    data = prediction_data, 
    color = "blue"
  )

The scatter plot of bream masses versus their lengths, with a linear trend line. the plot has been annotated with the points calculated using predict(). These points all follow the trend line exactly.

Introduction to Regression in R

Extrapolating

Extrapolating means making predictions outside the range of observed data.

explanatory_little_bream <- tibble(length_cm = 10)
explanatory_little_bream %>% 
  mutate(
    mass_g = predict(
      mdl_mass_vs_length, explanatory_little_bream
    )
  )
# A tibble: 1 x 2
  length_cm mass_g
      <dbl>  <dbl>
1        10  -490.

scatter-bream-mass-vs-length-extrapolate.png

Introduction to Regression in R

Let's practice!

Introduction to Regression in R

Preparing Video For Download...