Linear regression with tidymodels

Modeling with tidymodels in R

David Svancer

Data Scientist

Model fitting with parsnip

Model fitting with parsnip

Modeling with tidymodels in R

Linear regression model

Predicting hwy using cty as a predictor  

$$hwy = \beta_{0} + \beta_{1} cty$$

Model parameters

  • $ \beta_{0} $ is the intercept
  • $ \beta_{1} $ is the slope

 

Highway versus city fuel efficiency

Modeling with tidymodels in R

Linear regression model

Predicting hwy using cty as a predictor  

$$hwy = \beta_{0} + \beta_{1} cty$$

Model parameters

  • $ \beta_{0} $ is the intercept
  • $ \beta_{1} $ is the slope

 

Estimated paramters from training data

$$\small hwy = 0.77 + 1.35(cty)$$

 

Mpg data with linear regression line

Modeling with tidymodels in R

Model formulas

Model formulas in parsnip

  • Used to assign column roles
    • Outcome variable
    • Predictor variables

General form

outcome ~ predictor_1 + predictor_2 + ...

Shorthand notation

outcome ~ .

Predicting hwy using cty as a predictor variable

hwy ~ cty
Modeling with tidymodels in R

The parsnip package

Unified syntax for model specification in R

  1. Specify the model type

    • Linear regression or other model type
  2. Specify the engine

    • Different engines correspond to different underlying R packages
  3. Specify the mode

    • Either regression or classification

Parsnip package

Modeling with tidymodels in R

Fitting a linear regression model

 

Define model specification with parsnip

  • linear_reg()

 

Pass lm_model to the fit() function

  • Specify model formula
  • data to use for model fitting

 

lm_model <- linear_reg() %>%

set_engine('lm') %>%
set_mode('regression')

 

lm_fit <- lm_model %>% 
  fit(hwy ~ cty, data = mpg_training)
Modeling with tidymodels in R

Obtaining the estimated parameters

 

The tidy() function

  • Takes a trained parsnip model object
  • Creates a model summary tibble
  • term and estimate column provide estimated parameters

 

tidy(lm_fit)
# A tibble: 2 x 5
  term        estimate std.error statistic  p.value
  <chr>          <dbl>     <dbl>     <dbl>    <dbl>
1 (Intercept)    0.769    0.528       1.46 1.47e- 1
2 cty            1.35     0.0305     44.2  6.32e-97
Modeling with tidymodels in R

Making predictions

Pass trained parsnip model to the predict() function

  • new_data specifies dataset on which to predict new values

 

Standardized output from predict()

  1. Returns a tibble
  2. Keep rows in the same order as new_data input
  3. Names prediction column .pred
hwy_predictions <- lm_fit %>% 
  predict(new_data = mpg_test)

hwy_predictions
# A tibble: 57 x 1
   .pred
   <dbl>
 1  25.0
 2  27.7
 3  25.0
 4  25.0
 5  22.3
# ... with 47 more rows
Modeling with tidymodels in R

Adding predictions to the test data

The bind_cols() function

  • Combines two or more tibbles along the column axis
  • Useful for creating a model results tibble

Steps

  • Select hwy and cty from mpg_test
  • Pass to bind_cols() and add predictions column
mpg_test_results <- mpg_test %>%
  select(hwy, cty) %>%

bind_cols(hwy_predictions) mpg_test_results
# A tibble: 57 x 3
     hwy   cty .pred
   <int> <int> <dbl>
 1    29    18  25.0
 2    31    20  27.7
 3    27    18  25.0
 4    26    18  25.0
 5    25    16  22.3
# ... with 47 more rows
Modeling with tidymodels in R

Let's model!

Modeling with tidymodels in R

Preparing Video For Download...