Logistic regression on sonar

Machine Learning with caret in R

Max Kuhn

Software Engineer at RStudio and creator of caret

Classification models

  • Categorical (i.e. qualitative) target variable
  • Example: will a loan default?
  • Still a form of supervised learning
  • Use a train/test split to evaluate performance
  • Use the Sonar dataset
  • Goal: distinguish rocks from mines
Machine Learning with caret in R

Example: Sonar data

# Load the Sonar dataset
library(mlbench)
data(Sonar)
# Look at the data
Sonar[1:6, c(1:5, 61)]
      V1     V2     V3     V4     V5 Class
1 0.0200 0.0371 0.0428 0.0207 0.0954     R
2 0.0453 0.0523 0.0843 0.0689 0.1183     R
3 0.0262 0.0582 0.1099 0.1083 0.0974     R
4 0.0100 0.0171 0.0623 0.0205 0.0205     R
5 0.0762 0.0666 0.0481 0.0394 0.0590     R
6 0.0286 0.0453 0.0277 0.0174 0.0384     R
Machine Learning with caret in R

Splitting the data

  • Randomly split data into training and test sets
  • Use a 60/40 split, instead of 80/20
  • Sonar dataset is small, so 60/40 gives a larger, more reliable test set
Machine Learning with caret in R

Splitting the data

# Randomly order the dataset
rows <- sample(nrow(Sonar))
Sonar <- Sonar[rows, ]
# Find row to split on
split <- round(nrow(Sonar) * 0.60)
train <- Sonar[1:split, ]
test <- Sonar[(split + 1):nrow(Sonar), ]
# Confirm test set size
nrow(train) / nrow(Sonar)
0.6009615
Machine Learning with caret in R

Let's practice!

Machine Learning with caret in R

Preparing Video For Download...