Out-of-sample error measures

Machine Learning with caret in R

Zach Mayer

Data Scientist at DataRobot and co-author of caret

Out-of-sample error

  • Want models that don't overfit and generalize well
  • Do the models perform well on new data?
  • Test models on new data, or a test set
    • Key insight of machine learning
    • In-sample validation almost guarantees overfitting
  • Primary goal of caret and this course: don’t overfit
Machine Learning with caret in R

Example: out-of-sample RMSE

# Fit a model to the mtcars data
data(mtcars)
model <- lm(mpg ~ hp, mtcars[1:20, ])
# Predict out-of-sample
predicted <- predict(
  model, mtcars[21:32, ], type = "response"
)
# Evaluate error
actual <- mtcars[21:32, "mpg"]
sqrt(mean((predicted - actual) ^ 2))
5.507236
Machine Learning with caret in R

Compare to in-sample RMSE

# Fit a model to the full dataset
model2 <- lm(mpg ~ hp, mtcars)
# Predict in-sample
predicted2 <- predict(
  model, mtcars, type = "response"
)
# Evaluate error
actual2 <- mtcars[, "mpg"]
sqrt(mean((predicted2 - actual2) ^ 2))
3.74

Compare to out-of-sample RMSE of 5.5.

Machine Learning with caret in R

Let's practice!

Machine Learning with caret in R

Preparing Video For Download...