Multiclass problems

Support Vector Machines in R

Kailash Awati

Instructor

The iris dataset - an introduction

  • 150 measurements of 5 attributes
    • Petal width and length - number (predictor variables)
    • Sepal width and length - number (predictor variables)
    • Species - category: setosa, virginica or versicolor (predicted variable)
  • Dataset available from UCI ML repository
Support Vector Machines in R

Visualizing the iris dataset

  • Plot petal length vs petal width.
library(ggplot2)

# Plot petal length vs width for dataset, distinguish species by color
p <- ggplot(data = iris,
            aes(x = Petal.Width,
                y = Petal.Length,
                color = Species)) +
     geom_point()

# Display plot
p
Support Vector Machines in R

Chapter 2.4 - iris dataset: petal length vs petal width, species distinguished by color

Support Vector Machines in R

How does the SVM algorithm deal with multiclass problems?

  • SVMs are essentially binary classifiers.
  • Can be applied to multiclass problems using the following voting strategy:
    • Partition the data into subsets containing two classes each.
    • Solve the binary classification problem for each subset.
    • Use majority vote to assign a class to each data point.
  • Called one-against-one classification strategy.
Support Vector Machines in R

Building a multiclass linear SVM

  • Build a linear SVM for the iris dataset
    • 80/20 training / test split (seed 10), default cost
library(e1071)

# Build model
svm_model <- svm(Species ~ ., 
                data = trainset, 
                type = "C-classification", 
                kernel = "linear")
  • Calculate accuracy
pred_train <- predict(svm_model, trainset)
mean(pred_train == trainset$Species)
0.9756098
pred_test <- predict(svm_model, testset)
mean(pred_test == testset$Species)
0.962963
Support Vector Machines in R

Time to practice!

Support Vector Machines in R

Preparing Video For Download...