Comparing logistic regression and SVM

Linear Classifiers in Python

Michael (Mike) Gelbart

Instructor, The University of British Columbia

Logistic regression:

  • Is a linear classifier
  • Can use with kernels, but slow
  • Outputs meaningful probabilities
  • Can be extended to multi-class
  • All data points affect fit
  • L2 or L1 regularization

Support vector machine (SVM):

  • Is a linear classifier
  • Can use with kernels, and fast
  • Does not naturally output probabilities
  • Can be extended to multi-class
  • Only "support vectors" affect fit
  • Conventionally just L2 regularization
Linear Classifiers in Python

Use in scikit-learn

Logistic regression in sklearn:

  • linear_model.LogisticRegression

Key hyperparameters in sklearn:

  • C (inverse regularization strength)
  • penalty (type of regularization)
  • multi_class (type of multi-class)

SVM in sklearn:

  • svm.LinearSVC and svm.SVC
Linear Classifiers in Python

Use in scikit-learn (cont.)

Key hyperparameters in sklearn:

  • C (inverse regularization strength)
  • kernel (type of kernel)
  • gamma (inverse RBF smoothness)
Linear Classifiers in Python

SGDClassifier

SGDClassifier: scales well to large datasets

from sklearn.linear_model import SGDClassifier

logreg = SGDClassifier(loss='log_loss')

linsvm = SGDClassifier(loss='hinge')
  • SGDClassifier hyperparameter alpha is like 1/C
Linear Classifiers in Python

Let's practice!

Linear Classifiers in Python

Preparing Video For Download...