Principle components analysis (PCA)

Machine Learning with caret in R

Zach Mayer

Data Scientist at DataRobot and co-author of caret

Principle components analysis

  • Combines low-variance and correlated variables
  • Single set of high-variance, perpendicular predictors
  • Prevents collinearity (i.e. correlation among predictors)
Machine Learning with caret in R

PCA: a visual representation

  • First component has highest variance
  • Second component has second highest variance
  • And so on ...

pasted-image-919.png

Machine Learning with caret in R

Example: blood-brain data

  • Lots of predictors
  • Many of them low-variance
# Load the blood brain dataset
data(BloodBrain)
names(bbbDescr)[nearZeroVar(bbbDescr)]
[1] "negative"     "peoe_vsa.2.1" "peoe_vsa.3.1"
[4] "a_acid"       "vsa_acid"     "frac.anion7."
[7] "alert"  
Machine Learning with caret in R

Example: blood-brain data

# Basic model
set.seed(42)
data(BloodBrain)
model <- train(
  bbbDescr, 
  logBBB, 
  method = "glm",
  trControl = trainControl(
    method = "cv", number = 10, verbose = TRUE
  ),
  preProcess = c("zv", "center", "scale")
)
min(model$results$RMSE)
1.107702     
Machine Learning with caret in R

Example: blood-brain data

# Remove low-variance predictors
set.seed(42)
data(BloodBrain)
model <- train(
  bbbDescr, 
  logBBB, 
  method = "glm",
  trControl = trainControl(
    method = "cv", number = 10, verbose = TRUE
  ),
  preProcess = c("nzv", "center", "scale")
)
min(model$results$RMSE)
0.9796199
Machine Learning with caret in R

Example: blood-brain data

# Add PCA
set.seed(42)
data(BloodBrain)
model <- train(
  bbbDescr, 
  logBBB, 
  method = "glm",
  trControl = trainControl(
    method = "cv", number = 10, verbose = TRUE
  ),
  preProcess = c("zv", "center", "scale", "pca")
)
min(model$results$RMSE)
0.9796199
Machine Learning with caret in R

Let’s practice!

Machine Learning with caret in R

Preparing Video For Download...