Machine Learning
Supervised Learning: Classification / Support Vector Machine (SVM)
6. Support Vector Machine (SVM)
Support Vector Machine (SVM) is a powerful supervised learning algorithm. It finds the optimal hyperplane that best separates data points of different classes by maximizing the margin between them.
Key Terminology
Hyperplane: A decision boundary that separates different classes. In 2D it's a line, in 3D it's a plane. Equation: w·x + b = 0.
Support Vectors: Training data points closest to the hyperplane. If removed, the hyperplane would change. The margin is entirely determined by them.
Margin: The perpendicular distance between the hyperplane and the nearest data points from each class. SVM maximizes this. Margin = 2 / ||w||.
SVM Scenarios
1. Hard Margin SVM (Linearly Separable)
Works only when data can be perfectly separated. It maximizes the margin with no misclassifications allowed.
Subject to: yᵢ(w·xᵢ + b) ≥ 1
2. Soft Margin / C-SVM (Non-Separable/Noisy)
Allows some misclassification to handle overlapping classes. Introduces slack variables ξᵢ.
Subject to: yᵢ(w·xᵢ + b) ≥ 1 − ξᵢ
High C = fewer violations (risk overfitting), Low C = more violations (wider margin).
3. Kernel SVM (Non-Linearly Separable)
Uses the Kernel Trick to map data into a higher-dimensional space. Formulas:
- Polynomial: K(x,y) = (xᵀy + c)ᵈ
- RBF (Gaussian): K(x,y) = exp(−γ||x−y||²)
Multi-class Classification
- One-vs-One (OvO): Train a binary SVM for every pair of classes. (K(K−1)/2 classifiers). Predict by majority vote.
- One-vs-Rest (OvR): Train one SVM per class against all others. Predict using maximum decision score.