Machine Learning

Unsupervised Learning / Intro to Unsupervised Learning


1. Introduction to Unsupervised Learning

Unsupervised Learning is a type of machine learning where the algorithm learns patterns from UNLABELED data. There is no target variable (y) to predict; the algorithm must discover the hidden structure, patterns, or groupings on its own.

What is Clustering?

Clustering is the task of grouping a set of data points such that points in the same group (cluster) are MORE SIMILAR to each other than to points in other groups.

Real-World Examples:
  • Customer Segmentation: Group customers by purchasing behavior for targeted marketing.
  • Document Clustering: Group news articles by topic for search engines.
  • Image Segmentation: Group pixels by color/texture for object detection.
  • Anomaly Detection: Points NOT in any cluster are anomalies (e.g., fraud).

Distance Measures

Clustering algorithms measure how close data points are using distance metrics:

Euclidean Distance (L2)

Straight-line distance. Used in K-Means.

d(x, y) = √[ Σ(xᵢ - yᵢ)² ]
Manhattan Distance (L1)

Grid-like path distance. Better for high-dimensional data.

d(x, y) = Σ |xᵢ - yᵢ|
Cosine Similarity

Measures the angle between vectors. Great for text/document clustering.

cos(θ) = (A · B) / (||A|| × ||B||)