Machine Learning
Unsupervised Learning / Intro to Unsupervised Learning
1. Introduction to Unsupervised Learning
Unsupervised Learning is a type of machine learning where the algorithm learns patterns from UNLABELED data. There is no target variable (y) to predict; the algorithm must discover the hidden structure, patterns, or groupings on its own.
What is Clustering?
Clustering is the task of grouping a set of data points such that points in the same group (cluster) are MORE SIMILAR to each other than to points in other groups.
Real-World Examples:
- Customer Segmentation: Group customers by purchasing behavior for targeted marketing.
- Document Clustering: Group news articles by topic for search engines.
- Image Segmentation: Group pixels by color/texture for object detection.
- Anomaly Detection: Points NOT in any cluster are anomalies (e.g., fraud).
Distance Measures
Clustering algorithms measure how close data points are using distance metrics:
Euclidean Distance (L2)
Straight-line distance. Used in K-Means.
d(x, y) = √[ Σ(xᵢ - yᵢ)² ]
Manhattan Distance (L1)
Grid-like path distance. Better for high-dimensional data.
d(x, y) = Σ |xᵢ - yᵢ|
Cosine Similarity
Measures the angle between vectors. Great for text/document clustering.
cos(θ) = (A · B) / (||A|| × ||B||)