Machine Learning

Supervised Learning: Classification / Support Vector Machine (SVM)


6. Support Vector Machine (SVM)

Support Vector Machine (SVM) is a powerful supervised learning algorithm. It finds the optimal hyperplane that best separates data points of different classes by maximizing the margin between them.

Key Terminology

Hyperplane: A decision boundary that separates different classes. In 2D it's a line, in 3D it's a plane. Equation: w·x + b = 0.

Support Vectors: Training data points closest to the hyperplane. If removed, the hyperplane would change. The margin is entirely determined by them.

Margin: The perpendicular distance between the hyperplane and the nearest data points from each class. SVM maximizes this. Margin = 2 / ||w||.

SVM Scenarios

1. Hard Margin SVM (Linearly Separable)

Works only when data can be perfectly separated. It maximizes the margin with no misclassifications allowed.

Maximize: 2 / ||w||
Subject to: yᵢ(w·xᵢ + b) ≥ 1
2. Soft Margin / C-SVM (Non-Separable/Noisy)

Allows some misclassification to handle overlapping classes. Introduces slack variables ξᵢ.

Minimize: (1/2)||w||² + C * Σξᵢ
Subject to: yᵢ(w·xᵢ + b) ≥ 1 − ξᵢ

High C = fewer violations (risk overfitting), Low C = more violations (wider margin).

3. Kernel SVM (Non-Linearly Separable)

Uses the Kernel Trick to map data into a higher-dimensional space. Formulas:

  • Polynomial: K(x,y) = (xᵀy + c)ᵈ
  • RBF (Gaussian): K(x,y) = exp(−γ||x−y||²)

Multi-class Classification

  • One-vs-One (OvO): Train a binary SVM for every pair of classes. (K(K−1)/2 classifiers). Predict by majority vote.
  • One-vs-Rest (OvR): Train one SVM per class against all others. Predict using maximum decision score.