Modeling with tidymodels in R
David Svancer
Data Scientist
All yardstick
functions require a tibble with model results
hwy
for mpg data.pred
mpg_test_results
# A tibble: 57 x 3
hwy cty .pred
<int> <int> <dbl>
1 29 18 25.0
2 31 20 27.7
3 27 18 25.0
4 26 18 25.0
5 25 16 22.3
# ... with 47 more rows
RMSE estimates the average prediction error
rmse()
function from yardstick
truth
is the column with true outcome valuesestimate
is the column with predicted outcome valuesmpg_test_results %>%
rmse(truth = hwy, estimate = .pred)
# A tibble: 1 x 3
.metric .estimator .estimate
<chr> <chr> <dbl>
1 rmse standard 1.93
Measures the squared correlation between actual and predicted values
rsq()
function from yardstick
mpg_test_results %>%
rsq(truth = hwy, estimate = .pred)
# A tibble: 1 x 3
.metric .estimator .estimate
<chr> <chr> <dbl>
1 rsq standard 0.904
Visualization of the R squared metric
Making R squared plots with ggplot2
geom_point()
geom_abline()
coord_obs_pred()
ggplot(mpg_test_results, aes(x = hwy, y = .pred)) +
geom_point() +
geom_abline(color = 'blue', linetype = 2) +
coord_obs_pred() + labs(title = 'R-Squared Plot', y = 'Predicted Highway MPG', x = 'Actual Highway MPG')
The last_fit()
function
lm_last_fit <- lm_model %>%
last_fit(hwy ~ cty,
split = mpg_split)
The collect_metrics()
function
last_fit()
lm_last_fit %>%
collect_metrics()
# A tibble: 2 x 3
.metric .estimator .estimate
<chr> <chr> <dbl>
1 rmse standard 1.93
2 rsq standard 0.904
The collect_predictions()
function
last_fit()
.pred
lm_last_fit %>%
collect_predictions()
# A tibble: 57 x 4
id .pred .row hwy
<chr> <dbl> <int> <int>
1 train/test split 25.0 1 29
2 train/test split 27.7 3 31
3 train/test split 25.0 7 27
4 train/test split 25.0 8 26
5 train/test split 22.3 9 25
# ... with 47 more rows
Modeling with tidymodels in R