Reusing a trainControl

Machine Learning with caret in R

Max Kuhn

Software Engineer at RStudio and creator of caret

A real-world example

  • The data: customer churn at telecom company
  • Fit different models and choose the best
  • Models must use the same training/test splits
  • Create a shared trainControl object
Machine Learning with caret in R

Example: customer churn data

# Summarize the target variables
library(caret)
library(C50)
data(churn)
table(churnTrain$churn) / nrow(churnTrain)
      yes        no 
0.1449145 0.8550855 
Machine Learning with caret in R

Example: customer churn data

# Create train/test indexes
set.seed(42)
myFolds <- createFolds(churnTrain$churn, k = 5)
# Compare class distribution
i <- myFolds$Fold1
table(churnTrain$churn[i]) / length(i)
      yes        no 
0.1441441 0.8558559
Machine Learning with caret in R

Example: customer churn data

myControl <- trainControl(
  summaryFunction = twoClassSummary,
  classProbs = TRUE,
  verboseIter = TRUE,
  savePredictions = TRUE,
  index = myFolds
)
  • Use folds to create a trainControl object
  • Exact same cross-validation folds for each model
Machine Learning with caret in R

Let’s practice!

Machine Learning with caret in R

Preparing Video For Download...