Evaluating model performance

Modeling with tidymodels in R

David Svancer

Data Scientist

Input to yardstick functions

All yardstick functions require a tibble with model results

  • Column with the true outcome variable values
    • hwy for mpg data
  • Column with model predictions
    • .pred
mpg_test_results
# A tibble: 57 x 3
     hwy   cty .pred
   <int> <int> <dbl>
 1    29    18  25.0
 2    31    20  27.7
 3    27    18  25.0
 4    26    18  25.0
 5    25    16  22.3
# ... with 47 more rows
Modeling with tidymodels in R

Root mean squared error (RMSE)

RMSE estimates the average prediction error

  • Calculated with the rmse() function from yardstick
    • Takes a tibble with model results
    • truth is the column with true outcome values
    • estimate is the column with predicted outcome values
mpg_test_results %>% 
  rmse(truth = hwy, estimate = .pred)
# A tibble: 1 x 3
  .metric .estimator .estimate
  <chr>   <chr>          <dbl>
1 rmse    standard        1.93
Modeling with tidymodels in R

R squared metric

Measures the squared correlation between actual and predicted values

  • Also called the coefficient of determination
  • Ranges from 0 to 1
    • When all predictions equal the true outcome values, R squared is 1
  • Calculated with the rsq() function from yardstick
mpg_test_results %>% 
  rsq(truth = hwy, estimate = .pred)
# A tibble: 1 x 3
  .metric .estimator .estimate
  <chr>   <chr>          <dbl>
1 rsq     standard       0.904
Modeling with tidymodels in R

R squared plots

Visualization of the R squared metric

  • Model predictions versus the true outcome
  • The line y = x
    • Represents R squared of 1
  • Used to find potential problems with model performance
    • Non-linear patterns
    • Regions where model is predicting poorly

Mpg model R squared plot

Modeling with tidymodels in R

Plotting R squared plots

Making R squared plots with ggplot2

  • Tibble of model results
  • geom_point()
  • geom_abline()
  • coord_obs_pred()
ggplot(mpg_test_results, aes(x = hwy, y = .pred)) +

geom_point() +
geom_abline(color = 'blue', linetype = 2) +
coord_obs_pred() + labs(title = 'R-Squared Plot', y = 'Predicted Highway MPG', x = 'Actual Highway MPG')

Mpg model R squared plot

Modeling with tidymodels in R

Streamlining model fitting

The last_fit() function

  • Takes a model specification, model formula, and data split object
  • Performs the following:
    1. Creates training and test datasets
    2. Fits the model to the training data
    3. Calculates metrics and predictions on the test data
    4. Returns an object with all results
lm_last_fit <- lm_model %>% 
  last_fit(hwy ~ cty, 
           split = mpg_split)
Modeling with tidymodels in R

Collecting metrics

The collect_metrics() function

  • Takes the results of last_fit()
    • Returns a tibble with performance metrics obtained on the test dataset
  • Default regression model metrics
    • RMSE
    • R squared
lm_last_fit %>% 
  collect_metrics()
# A tibble: 2 x 3
  .metric .estimator .estimate
  <chr>   <chr>          <dbl>
1 rmse    standard       1.93 
2 rsq     standard       0.904
Modeling with tidymodels in R

Collecting predictions

The collect_predictions() function

  • Takes the results of last_fit()
    • Returns a tibble with test dataset predictions
    • Predictions column is named .pred
    • Outcome variable and other row identifier columns included
lm_last_fit %>% 
  collect_predictions()
# A tibble: 57 x 4
   id               .pred  .row   hwy
   <chr>            <dbl> <int> <int>
 1 train/test split  25.0     1    29
 2 train/test split  27.7     3    31
 3 train/test split  25.0     7    27
 4 train/test split  25.0     8    26
 5 train/test split  22.3     9    25
# ... with 47 more rows
Modeling with tidymodels in R

Let's evaluate some models!

Modeling with tidymodels in R

Preparing Video For Download...