Linear Support Vector Machines

Support Vector Machines in R

Kailash Awati

Instructor

Split into training and test sets

The dataset generated in previous chapter is in dataframe df.
Split dataset into training and test sets

Random 80/20 split

# Set seed for reproducibility
set.seed(1)
# Set the upper bound for the number of rows to be in the training set
sample_size <- floor(0.8 * nrow(df))
# Assign rows to training/test sets randomly in 80/20 proportion
train <- sample(seq_len(nrow(df)), size = sample_size)
# Separate training and test sets
trainset <- df[train, ]
testset <- df[-train, ]

Decision boundaries and kernels

Decision boundaries can have different shapes - lines, polynomials or more complex functions.
Type of decision boundary is called a kernel.
Kernel must be specified upfront.
This chapter focuses on linear kernels.

SVM with linear kernel

We'll use the svm function from the e1071 library.
The function has a number of parameters. We'll set the following explicitly:
- formula - a formula specifying the dependent variable. y in our case.
- data - dataframe containing the data - i.e. trainset.
- type - set to C-classification (classification problem).
- kernel - this is the form of the decision boundary, linear in this case.
- cost and gamma - these are parameters that are used to tune the model.
- scale - Boolean indicating whether to scale data.

Building a linear SVM

Load e1071 library and invoke svm() function

library(e1071)

svm_model<- svm(y ~ .,
                data = trainset, 
                type = "C-classification", 
                kernel = "linear", 
                scale = FALSE)

Overview of model

Entering svm_model gives:
- an overview of the model including classification and kernel type
- tuning parameter values

svm_model

Call:
svm(formula = y ~ .,
    data = trainset,
    type = "C-classification",
    kernel = "linear", 
    scale = FALSE)

Parameters:
   SVM-Type:  C-classification 
 SVM-Kernel:  linear 
       cost:  1 
      gamma:  0.5

Number of Support Vectors:  55

# Index of support vectors in training dataset
svm_model$index

# Support vectors
svm_model$SV

# Negative intercept (unweighted)
svm_model$rho

# Weighting coefficients for support vectors
svm_model$coefs

4   8  10  11  18  37  38  39  47  59  60  74  76  77  78  80  83 ...

             x1         x2
5   0.519095949 0.44232464

-0.1087075

            [,1]
 [1,]  1.0000000

Obtain class predictions for training and test sets.
Evaluate the training and test set accuracy of the model.

# Training accuracy
pred_train <- predict(svm_model, trainset)
mean(pred_train == trainset$y)

# Test accuracy
pred_test <- predict(svm_model, testset)
mean(pred_test == testset$y)

1
# Perfect!!

Time to practice!

Support Vector Machines in R