Tuning linear SVMs

Support Vector Machines in R

Kailash Awati

Instructor

Linear SVM, default cost

library(e1071)
svm_model <- svm(y ~ ., 
                data = trainset, 
                type = "C-classification", 
                kernel = "linear", 
                scale = FALSE)
# Print model summary
svm_model

Call:
svm(formula = y ~ .,
    data = trainset,
    type = "C-classification", 
    kernel = "linear",
    scale = FALSE)

Parameters:
SVM-Type:  C-classification 
 SVM-Kernel:  linear 
       cost:  1 
      gamma:  0.5 
Number of Support Vectors:  55

Chapter 2.3 - linearly separable dataset, default cost linear kernel with support vectors, decision and margin boundaries

Linear SVM with cost = 100

library(e1071)
svm_model <- svm(y ~ ., 
                data = trainset, 
                type = "C-classification", 
                kernel = "linear", 
                cost = 100,
                scale = FALSE)
# Print model summary
svm_model

Call:
svm(formula = y ~ .,
    data = trainset,
    type = "C-classification", 
    kernel = "linear",
    cost = 100,
    scale = FALSE)

Parameters:
SVM-Type:  C-classification 
 SVM-Kernel:  linear 
       cost:  100 
      gamma:  0.5 
Number of Support Vectors:  6

Chapter 2.3 - linearly separable data, cost = 100 linear kernel with support vectors, decision and margin boundaries

Implication

Can be useful to reduce margin if decision boundary is known to be linear
...but this is rarely the case in real life

Chapter 2.3 - nonlinearly separable dataset

Nonlinear dataset, linear SVM (cost = 100)

Build cost=100 model using training set composed of 80% of data

# Build model
library(e1071)
svm_model<- svm(y ~ ., 
                data = trainset, 
                type = "C-classification", 
                kernel = "linear", 
                cost = 100,
                scale = FALSE)

Calculate accuracy

# Train and test accuracy
pred_train <- predict(svm_model, trainset)
mean(pred_train == trainset$y)

0.8208333

pred_test <- predict(svm_model, testset)
mean(pred_test == testset$y)

0.85

Average test accuracy over 50 random train/test splits: 82.9%

Chapter 2.3 - nonlinearly separable dataset, cost=100 linear kernel with support vectors, decision and margin boundaries

Nonlinear dataset, linear SVM (cost = 1)

Rebuild model setting cost =1

# Trainset contains 80% of data
# Same train/test split as before.
# Build model
svm_model <- svm(y ~ ., 
                data = trainset, 
                type = "C-classification", 
                kernel = "linear", 
                cost = 1,
                scale = FALSE)

Calculate test accuracy

# Test accuracy
pred_test <- predict(svm_model, testset)
mean(pred_test == testset$y)

0.8666667

Average test accuracy over 50 random train/test splits: 83.7%

Chapter 2.3 - nonlinearly separable dataset, cost=1 linear kernel with support vectors, decision and margin boundarien

Time to practice!

Support Vector Machines in R