Machine Learning

Unsupervised Learning / Hierarchical Clustering


4. Hierarchical Clustering

Hierarchical Clustering creates a tree-like hierarchy of clusters called a Dendrogram. Unlike K-Means, you do not need to specify the number of clusters (k) in advance; you can cut the dendrogram at an appropriate height later to define your clusters.

A. Agglomerative (Bottom-Up)

  1. Start: n clusters (each point is its own cluster).
  2. Compute the pairwise distance matrix.
  3. Merge: Find the two closest clusters and merge them into a single new cluster.
  4. Update: Recompute the distances between the new cluster and all remaining clusters using a Linkage Method.
  5. Repeat: Steps 3 and 4 until all points merge into one single master cluster.

B. Divisive (Top-Down / DIANA)

  1. Start: 1 cluster containing ALL points.
  2. Split: Find the point with the greatest average dissimilarity to other cluster members. This point starts a new sub-cluster.
  3. Reassign points to the sub-cluster they are most similar to.
  4. Repeat: Until each point is in its own individual cluster (n clusters).

Linkage Methods (Measuring Inter-Cluster Distance)

1. Single Linkage (MIN)

Distance between the NEAREST pair of points (one from each cluster).

Creates long, chain-like clusters ("chaining effect"). Highly sensitive to noise/outliers.

2. Complete Linkage (MAX)

Distance between the FARTHEST pair of points.

Creates compact, ball-shaped clusters of roughly equal diameter. Good for well-separated data.

3. Average Linkage

The average of all pairwise distances between points in cluster A and cluster B.

4. Ward's Method

Merges the two clusters that result in the minimum increase in the total within-cluster variance.