Modeling with tidymodels in R
David Svancer
Data Scientist
All yardstick functions require a tibble with model results
hwy for mpg data.predmpg_test_results
# A tibble: 57 x 3
hwy cty .pred
<int> <int> <dbl>
1 29 18 25.0
2 31 20 27.7
3 27 18 25.0
4 26 18 25.0
5 25 16 22.3
# ... with 47 more rows
RMSE estimates the average prediction error
rmse() function from yardsticktruth is the column with true outcome valuesestimate is the column with predicted outcome valuesmpg_test_results %>%
rmse(truth = hwy, estimate = .pred)
# A tibble: 1 x 3
.metric .estimator .estimate
<chr> <chr> <dbl>
1 rmse standard 1.93
Measures the squared correlation between actual and predicted values
rsq() function from yardstickmpg_test_results %>%
rsq(truth = hwy, estimate = .pred)
# A tibble: 1 x 3
.metric .estimator .estimate
<chr> <chr> <dbl>
1 rsq standard 0.904
Visualization of the R squared metric
Making R squared plots with ggplot2
geom_point()geom_abline()coord_obs_pred()ggplot(mpg_test_results, aes(x = hwy, y = .pred)) +geom_point() +geom_abline(color = 'blue', linetype = 2) +coord_obs_pred() + labs(title = 'R-Squared Plot', y = 'Predicted Highway MPG', x = 'Actual Highway MPG')
The last_fit() function
lm_last_fit <- lm_model %>%
last_fit(hwy ~ cty,
split = mpg_split)
The collect_metrics() function
last_fit()lm_last_fit %>%
collect_metrics()
# A tibble: 2 x 3
.metric .estimator .estimate
<chr> <chr> <dbl>
1 rmse standard 1.93
2 rsq standard 0.904
The collect_predictions() function
last_fit().predlm_last_fit %>%
collect_predictions()
# A tibble: 57 x 4
id .pred .row hwy
<chr> <dbl> <int> <int>
1 train/test split 25.0 1 29
2 train/test split 27.7 3 31
3 train/test split 25.0 7 27
4 train/test split 25.0 8 26
5 train/test split 22.3 9 25
# ... with 47 more rows
Modeling with tidymodels in R