Automating the modeling workflow

Modeling with tidymodels in R

David Svancer

Data Scientist

Streamlining the workflow

The last_fit() function

  • Also accepts classification models
  • Speeds up the modeling process
  • Fits the model to the training data and produces predictions on the test dataset

 

Similar to using fit(), the first steps include:

  • Creating a data split object with rsample
  • Specifying a model with parsnip
leads_split <- initial_split(leads_df, 
                             strata = purchased)

logistic_model <- logistic_reg() %>% set_engine('glm') %>% set_mode('classification')
Modeling with tidymodels in R

Fitting the model and collecting metrics

The last_fit() function

  • parsnip model object
  • Model formula
  • Data split object

 

The collect_metrics() function calculates metrics using the test dataset

  • Accuracy and ROC AUC by default
logistic_last_fit <- logistic_model %>% 
  last_fit(purchased ~ total_visits + total_time,
           split = leads_split)

logistic_last_fit %>% collect_metrics()

 

# A tibble: 2 x 3
  .metric  .estimator .estimate
  <chr>      <chr>       <dbl>
1 accuracy   binary      0.759
2 roc_auc    binary      0.763
Modeling with tidymodels in R

Collecting predictions

collect_predictions()

  • Creates a tibble with all necessary columns for yardstick functions
  • Actual and predicted outcomes with the test data
  • Estimated probability columns for all outcome categories
last_fit_results <- logistic_last_fit %>% 
  collect_predictions()
last_fit_results
# A tibble: 332 x 6
   id             .pred_yes .pred_no .row .pred_class purchased
   <chr>            <dbl>    <dbl>   <int>   <fct>      <fct>
 1 train/test split  0.134    0.866     2      no        no
 2 train/test split  0.729    0.271    17      yes       yes
 3 train/test split  0.133    0.867    21      no        no
 4 train/test split  0.0916   0.908    22      no        no
 5 train/test split  0.598    0.402    24      yes       yes
# ... with 327 more rows
Modeling with tidymodels in R

Custom metric sets

The metric_set() function

  • accuracy(), sens(), and spec()
    • Require truth and estimate arguments
  • roc_auc()
    • Requires truth and column of estimated probabilities

 

The custom_metrics() function will need all three, with .pred_yes as the last argument

custom_metrics <- metric_set(accuracy, sens,
                             spec, roc_auc)
custom_metrics(last_fit_results,
               truth = purchased,
               estimate = .pred_class,
               .pred_yes)
# A tibble: 4 x 3
  .metric  .estimator .estimate
  <chr>    <chr>          <dbl>
1 accuracy binary         0.759
2 sens     binary         0.617
3 spec     binary         0.840
4 roc_auc  binary         0.763
Modeling with tidymodels in R

Let's practice!

Modeling with tidymodels in R

Preparing Video For Download...