Support Vector Machine (SVM) Classifiers
Definition
- Support Vector Machine (SVM): A system for efficiently training linear learning machines in kernel-induced feature spaces, leveraging generalisation theory and optimisation theory.
Scalar Product
- Measures similarity between vectors.
- Defined as:
Decision Function (Binary Classification)
- Decision function :
Linear Discrimination
Linear Separators
- Binary classification separates classes using a linear hyperplane:
Optimal Linear Separator
- Maximum Margin Criterion: SVM selects hyperplane with maximum margin between classes.
- Margin (): Distance between closest data points (support vectors) of two classes.
Classification Margin
- Distance from data point to hyperplane:
- Margin width:
Statistical Learning Theory
- Maximizing margin reduces complexity and prevents overfitting.
- Generalisation error bounded by function complexity and misclassification error.
- Solution depends only on support vectors.
Linear SVM Mathematically
Constraints:
For training set :
Optimization Problem (Primal Formulation):
Minimize:
subject to constraints above.
Dual Formulation:
Maximize:
subject to:
Solution:
- Weight vector and bias:
- Decision function:
Soft Margin Classification
Motivation:
- Handles non-linearly separable data by allowing misclassifications.
Slack Variables ():
- Allow margin violations for difficult/noisy examples.
Optimization Problem with Slack Variables:
Minimize:
subject to:
Dual Formulation (Soft Margin):
Maximize:
subject to:
Nonlinear Discrimination
Feature Spaces
- Data not linearly separable in original space can become separable in higher-dimensional feature spaces.
- Explicit construction of feature spaces unnecessary due to kernels.
Kernel Trick
- Kernel functions implicitly map data into higher-dimensional spaces.
- Kernel function defined as inner product in feature space:
Common Kernels Examples:
| Kernel Type | Formula |
|---|---|
| Linear | |
| Polynomial | |
| Gaussian (RBF) | |
| Two-layer Perceptron |
Non-linear SVM Mathematically
Dual Problem with Kernels:
Maximize:
Subject to constraints same as linear case.
Decision Function (Kernelized):
Theoretical Justification for Maximum Margins
- Vapnik’s theorem bounds VC dimension ():
-
- ,
- ,
-
- Maximizing margin minimizes classifier complexity irrespective of dimensionality.
SVM Applications & Extensions
Applications:
- Text classification, genomic data analysis, regression, principal component analysis, etc.
Extensions of SVMs include:
- Regression, Variable Selection, Boosting, Density Estimation, Unsupervised Learning, Novelty/Outlier Detection, Feature Detection, Clustering
Important Concepts Recap:
| Concept | Description |
|---|---|
| Support Vectors | Critical data points defining the hyperplane |
| Margin | Distance between support vectors of different classes |
| Kernel Trick | Implicitly maps data into higher-dimensional space |
| Soft Margin | Allows misclassifications for noisy/non-separable data |
| Quadratic Optimization | Method for solving SVM optimization problems |