Metriche di errore out-of-sample

Machine Learning con caret in R

Zach Mayer

Data Scientist at DataRobot and co-author of caret

Errore out-of-sample

Vogliamo modelli che non overfittino e generalizzino bene
I modelli vanno bene su nuovi dati?
Testa i modelli su nuovi dati, o su un test set
- Intuizione chiave del machine learning
- La validazione in-sample porta quasi sempre a overfitting
Obiettivo principale di caret e di questo corso: evitare l’overfitting

# Fit a model to the mtcars data
data(mtcars)
model <- lm(mpg ~ hp, mtcars[1:20, ])

# Predict out-of-sample
predicted <- predict(
  model, mtcars[21:32, ], type = "response"
)

# Evaluate error
actual <- mtcars[21:32, "mpg"]
sqrt(mean((predicted - actual) ^ 2))

5.507236

# Fit a model to the full dataset
model2 <- lm(mpg ~ hp, mtcars)

# Predict in-sample
predicted2 <- predict(
  model, mtcars, type = "response"
)

# Evaluate error
actual2 <- mtcars[, "mpg"]
sqrt(mean((predicted2 - actual2) ^ 2))

3.74

Confronta con l’RMSE out-of-sample di 5.5.

Machine Learning con caret in R