Linear Support Vector Machines

Support Vector Machines in R

Kailash Awati

Instructor

Split into training and test sets

  • The dataset generated in previous chapter is in dataframe df.
  • Split dataset into training and test sets
  • Random 80/20 split
    # Set seed for reproducibility
    set.seed(1)
    # Set the upper bound for the number of rows to be in the training set
    sample_size <- floor(0.8 * nrow(df))
    # Assign rows to training/test sets randomly in 80/20 proportion
    train <- sample(seq_len(nrow(df)), size = sample_size)
    # Separate training and test sets
    trainset <- df[train, ]
    testset <- df[-train, ]
    
Support Vector Machines in R

Decision boundaries and kernels

  • Decision boundaries can have different shapes - lines, polynomials or more complex functions.
  • Type of decision boundary is called a kernel.
  • Kernel must be specified upfront.
  • This chapter focuses on linear kernels.
Support Vector Machines in R

SVM with linear kernel

  • We'll use the svm function from the e1071 library.
  • The function has a number of parameters. We'll set the following explicitly:
    • formula - a formula specifying the dependent variable. y in our case.
    • data - dataframe containing the data - i.e. trainset.
    • type - set to C-classification (classification problem).
    • kernel - this is the form of the decision boundary, linear in this case.
    • cost and gamma - these are parameters that are used to tune the model.
    • scale - Boolean indicating whether to scale data.
Support Vector Machines in R

Building a linear SVM

  • Load e1071 library and invoke svm() function
library(e1071)
svm_model<- svm(y ~ .,
                data = trainset, 
                type = "C-classification", 
                kernel = "linear", 
                scale = FALSE)
Support Vector Machines in R

Overview of model

  • Entering svm_model gives:
    • an overview of the model including classification and kernel type
    • tuning parameter values
svm_model
Call:
svm(formula = y ~ .,
    data = trainset,
    type = "C-classification",
    kernel = "linear", 
    scale = FALSE)

Parameters:
   SVM-Type:  C-classification 
 SVM-Kernel:  linear 
       cost:  1 
      gamma:  0.5

Number of Support Vectors: 55
Support Vector Machines in R
# Index of support vectors in training dataset
svm_model$index

# Support vectors svm_model$SV
# Negative intercept (unweighted) svm_model$rho
# Weighting coefficients for support vectors svm_model$coefs
4   8  10  11  18  37  38  39  47  59  60  74  76  77  78  80  83 ...

x1 x2 5 0.519095949 0.44232464
-0.1087075
[,1] [1,] 1.0000000
Support Vector Machines in R
  • Obtain class predictions for training and test sets.
  • Evaluate the training and test set accuracy of the model.
# Training accuracy
pred_train <- predict(svm_model, trainset)
mean(pred_train == trainset$y)
1
# Test accuracy
pred_test <- predict(svm_model, testset)
mean(pred_test == testset$y)
1
# Perfect!!
Support Vector Machines in R

Time to practice!

Support Vector Machines in R

Preparing Video For Download...