Random forests and wine

Machine Learning with caret in R

Max Kuhn

Software Engineer at RStudio and creator of caret

Random forests

  • Popular type of machine learning model
  • Good for beginners
  • Robust to overfitting
  • Yield very accurate, non-linear models
Machine Learning with caret in R

Random forests

  • Unlike linear models, they have hyperparameters
  • Hyperparameters require manual specification
  • Can impact model fit and vary from dataset-to-dataset
  • Default values often OK, but occasionally need adjustment
Machine Learning with caret in R

Random forests

  • Start with a simple decision tree
  • Decision trees are fast, but not very accurate

Machine Learning with caret in R

Random forests

  • Improve accuracy by fitting many trees
  • Fit each one to a bootstrap sample of your data
  • Called bootstrap aggregation or bagging
  • Randomly sample columns at each split
Machine Learning with caret in R

Running a random forest

# Load some data
library(caret)
library(mlbench)
data(Sonar)

# Set seed
set.seed(42)
model <- train(
  Class ~ ., 
  data = Sonar, 
  method = "ranger"
)

pasted-image-1530.png

Machine Learning with caret in R

Plotting the results

# Plot the results
plot(model)

Machine Learning with caret in R

Let's practice!

Machine Learning with caret in R

Preparing Video For Download...