Support Vector Machine (SVM) Classifiers


Definition

  • Support Vector Machine (SVM): A system for efficiently training linear learning machines in kernel-induced feature spaces, leveraging generalisation theory and optimisation theory.

Scalar Product

  • Measures similarity between vectors.
  • Defined as:

Decision Function (Binary Classification)

  • Decision function :

Linear Discrimination

Linear Separators

  • Binary classification separates classes using a linear hyperplane:

Optimal Linear Separator

  • Maximum Margin Criterion: SVM selects hyperplane with maximum margin between classes.
  • Margin (): Distance between closest data points (support vectors) of two classes.

Classification Margin

  • Distance from data point to hyperplane:
  • Margin width:

Statistical Learning Theory

  • Maximizing margin reduces complexity and prevents overfitting.
  • Generalisation error bounded by function complexity and misclassification error.
  • Solution depends only on support vectors.

Linear SVM Mathematically

Constraints:

For training set :

Optimization Problem (Primal Formulation):

Minimize:

subject to constraints above.

Dual Formulation:

Maximize:

subject to:

Solution:

  • Weight vector and bias:
  • Decision function:

Soft Margin Classification

Motivation:

  • Handles non-linearly separable data by allowing misclassifications.

Slack Variables ():

  • Allow margin violations for difficult/noisy examples.

Optimization Problem with Slack Variables:

Minimize:

subject to:

Dual Formulation (Soft Margin):

Maximize:

subject to:


Nonlinear Discrimination

Feature Spaces

  • Data not linearly separable in original space can become separable in higher-dimensional feature spaces.
  • Explicit construction of feature spaces unnecessary due to kernels.

Kernel Trick

  • Kernel functions implicitly map data into higher-dimensional spaces.
  • Kernel function defined as inner product in feature space:

Common Kernels Examples:

Kernel TypeFormula
Linear
Polynomial
Gaussian (RBF)
Two-layer Perceptron

Non-linear SVM Mathematically

Dual Problem with Kernels:

Maximize:

Subject to constraints same as linear case.

Decision Function (Kernelized):


Theoretical Justification for Maximum Margins

  • Vapnik’s theorem bounds VC dimension ():
      • ,
      • ,
  • Maximizing margin minimizes classifier complexity irrespective of dimensionality.

SVM Applications & Extensions

Applications:

  • Text classification, genomic data analysis, regression, principal component analysis, etc.

Extensions of SVMs include:

  • Regression, Variable Selection, Boosting, Density Estimation, Unsupervised Learning, Novelty/Outlier Detection, Feature Detection, Clustering

Important Concepts Recap:

ConceptDescription
Support VectorsCritical data points defining the hyperplane
MarginDistance between support vectors of different classes
Kernel TrickImplicitly maps data into higher-dimensional space
Soft MarginAllows misclassifications for noisy/non-separable data
Quadratic OptimizationMethod for solving SVM optimization problems