Tuning linear SVMs

Support Vector Machines in R

Kailash Awati

Instructor

Linear SVM, default cost

library(e1071)
svm_model <- svm(y ~ ., 
                data = trainset, 
                type = "C-classification", 
                kernel = "linear", 
                scale = FALSE)
# Print model summary
svm_model
Call:
svm(formula = y ~ .,
    data = trainset,
    type = "C-classification", 
    kernel = "linear",
    scale = FALSE)

Parameters:
SVM-Type:  C-classification 
 SVM-Kernel:  linear 
       cost:  1 
      gamma:  0.5 
Number of Support Vectors:  55
Support Vector Machines in R

Chapter 2.3 - linearly separable dataset, default cost linear kernel with support vectors, decision and margin boundaries

Support Vector Machines in R

Linear SVM with cost = 100

library(e1071)
svm_model <- svm(y ~ ., 
                data = trainset, 
                type = "C-classification", 
                kernel = "linear", 
                cost = 100,
                scale = FALSE)
# Print model summary
svm_model
Call:
svm(formula = y ~ .,
    data = trainset,
    type = "C-classification", 
    kernel = "linear",
    cost = 100,
    scale = FALSE)

Parameters:
SVM-Type:  C-classification 
 SVM-Kernel:  linear 
       cost:  100 
      gamma:  0.5 
Number of Support Vectors:  6
Support Vector Machines in R

Chapter 2.3 - linearly separable data, cost = 100 linear kernel with support vectors, decision and margin boundaries

Support Vector Machines in R

Implication

  • Can be useful to reduce margin if decision boundary is known to be linear
  • ...but this is rarely the case in real life
Support Vector Machines in R

Chapter 2.3 - nonlinearly separable dataset

Support Vector Machines in R

Nonlinear dataset, linear SVM (cost = 100)

  • Build cost=100 model using training set composed of 80% of data
# Build model
library(e1071)
svm_model<- svm(y ~ ., 
                data = trainset, 
                type = "C-classification", 
                kernel = "linear", 
                cost = 100,
                scale = FALSE)
  • Calculate accuracy
# Train and test accuracy
pred_train <- predict(svm_model, trainset)
mean(pred_train == trainset$y)
0.8208333
pred_test <- predict(svm_model, testset)
mean(pred_test == testset$y)
0.85
  • Average test accuracy over 50 random train/test splits: 82.9%
Support Vector Machines in R

Chapter 2.3 - nonlinearly separable dataset, cost=100  linear kernel with support vectors, decision and margin boundaries

Support Vector Machines in R

Nonlinear dataset, linear SVM (cost = 1)

  • Rebuild model setting cost =1
# Trainset contains 80% of data
# Same train/test split as before.
# Build model
svm_model <- svm(y ~ ., 
                data = trainset, 
                type = "C-classification", 
                kernel = "linear", 
                cost = 1,
                scale = FALSE)
  • Calculate test accuracy
# Test accuracy
pred_test <- predict(svm_model, testset)
mean(pred_test == testset$y)
0.8666667
  • Average test accuracy over 50 random train/test splits: 83.7%
Support Vector Machines in R

Chapter 2.3 - nonlinearly separable dataset, cost=1  linear kernel with support vectors, decision and margin boundarien

Support Vector Machines in R

Time to practice!

Support Vector Machines in R

Preparing Video For Download...