Introducing glmnet

Machine Learning with caret in R

Zach Mayer

Data Scientist at DataRobot and co-author of caret

Introducing glmnet

  • Extension of glm models with built-in variable selection
  • Helps deal with collinearity and small samples sizes
  • Two primary forms
    • Lasso regression: penalizes number of non-zero coefficients
    • Ridge regression: penalizes absolute magnitude of coefficients
  • Attempts to find a parsimonious (i.e. simple) model
  • Pairs well with random forest models
Machine Learning with caret in R

Tuning glmnet models

  • Combination of lasso and ridge regression
  • Can fit a mix of the two models
  • alpha [0, 1]: pure ridge to pure lasso
  • lambda (0, infinity): size of the penalty
Machine Learning with caret in R

Example: "don't overfit"

# Load data
overfit <- read.csv("overfit.csv")

# Make a custom trainControl
myControl <- trainControl(
  method = "cv", 
  number = 10,
  summaryFunction = twoClassSummary,
  classProbs = TRUE, # <- Super important!
  verboseIter = TRUE
)
Machine Learning with caret in R

Try the defaults

# Fit a model
set.seed(42)
model <- train(
  y ~ ., 
  overfit, 
  method = "glmnet", 
  trControl = myControl
)

# Plot results
plot(model)
  • 3 values of alpha
  • 3 values of lambda
Machine Learning with caret in R

Plot the results

A line plot of cross-validated ROC curves by mixing percentage for three regularization parameters. There is a peak at 0.55 with the middle value of the regularization parameter.

Machine Learning with caret in R

Let’s practice!

Machine Learning with caret in R

Preparing Video For Download...