The tidymodels ecosystem

Modeling with tidymodels in R

David Svancer

Data Scientist

Collection of machine learning packages

The tidymodels package

Modeling with tidymodels in R

Collection of machine learning packages

Data resampling with rsample

Modeling with tidymodels in R

Collection of machine learning packages

Feature engineering with recipes

Modeling with tidymodels in R

Collection of machine learning packages

Model fitting with parnsip

Modeling with tidymodels in R

Collection of machine learning packages

Model tuning with tune and dials

Modeling with tidymodels in R

Collection of machine learning packages

Model evaluation with yardstick

Modeling with tidymodels in R

Supervised machine learning

Branch of machine learning that uses labeled data for model fitting

Regression

  • Predicting quantitative outcomes
    • Selling price of a home

 

Classification

  • Predicting categorical outcomes
    • Whether an employee will leave a company
left_company miles_from_home salary
no 1 84500
yes 10 64820
no 5 76490
yes 19 68540

 

tidymodels variable roles

  • left_company is an outcome variable
  • miles_from_home and salary are predictor variables
Modeling with tidymodels in R

Data resampling

 

Create training and test sets

  • Guards against overfitting
  • Common ratio is 75% training, 25% test

Training data

  • Feature engineering
  • Model fitting and tuning

Test data

  • Estimate model performance on new data

Creating training and test datasets

Modeling with tidymodels in R

Fuel efficiency data

Vehicle fuel efficiency data from the U.S. Environmental Protection Agency

  • Outcome variable is hwy - highway fuel efficiency in miles per gallon (mpg)
mpg
# A tibble: 234 x 11
     hwy   cty displ   cyl manufacturer model       year trans      drv   fl    class  
   <int> <int> <dbl> <int> <chr>        <chr>      <int> <chr>      <chr> <chr> <chr>  
 1    29    18   1.8     4 audi         a4          1999 auto(l5)   f     p     compact
 2    29    21   1.8     4 audi         a4          1999 manual(m5) f     p     compact
 3    31    20   2       4 audi         a4          2008 manual(m6) f     p     compact
 4    30    21   2       4 audi         a4          2008 auto(av)   f     p     compact
 5    26    16   2.8     6 audi         a4          1999 auto(l5)   f     p     compact
# ... with 224 more rows
Modeling with tidymodels in R

Data resampling with tidymodels

  • initial_split()

    • Specifies instructions for creating training and test datasets
    • prop specifies the proportion to place into training
    • strata provides stratification by the outcome variable
  • Pass split object to training() function

 

  • Pass split object to testing() function
library(tidymodels)
mpg_split <- initial_split(mpg,
                           prop = 0.75,
                           strata = hwy)

 

mpg_training <- mpg_split %>%
  training()
mpg_test <- mpg_split %>%
  testing()
Modeling with tidymodels in R

Home sales data

Home sales from the Seattle, Washington area between 2015 and 2016

home_sales
# A tibble: 1,492 x 8
   selling_price home_age bedrooms bathrooms sqft_living sqft_lot sqft_basement floors
           <dbl>    <dbl>    <dbl>     <dbl>       <dbl>    <dbl>         <dbl>  <dbl>
 1        487000       10        4      2.5         2540     5001             0      2
 2        465000       10        3      2.25        1530     1245           480      2
 3        411000       18        2      2           1130     1148           330      2
 4        635000        4        3      2.5         3350     4007           800      2
 5        380000       24        5      2.5         2130     8428             0      2
# ... with 1,482 more rows
Modeling with tidymodels in R

Let's practice!

Modeling with tidymodels in R

Preparing Video For Download...