Hierarchical Clustering
Explore hierarchical clustering to understand how agglomerative and divisive methods group data based on similarity. Learn to use scikit-learn's AgglomerativeClustering for practical applications, discover the impact of linkage criteria, and analyze hierarchical cluster structures to uncover patterns and relationships in your data.
We'll cover the following...
Hierarchical clustering is another popular unsupervised clustering algorithm that groups data points into clusters based on similarity. It works by building a hierarchy of clusters, starting with individual data points and gradually merging them into larger clusters.
There are two types of hierarchical clustering: agglomerative and divisive.
Agglomerative clustering
Agglomerative clustering is a hierarchical clustering algorithm that groups data points based on their pairwise distances or similarities. Unlike k-means or DBSCAN, agglomerative clustering doesn’t require specifying the number of clusters in advance. Instead, it builds a hierarchy of clusters by iteratively merging the most similar or nearby data points or clusters.
The algorithm starts by considering each data point as a separate cluster. It then repeatedly merges the two closest clusters based on a chosen linkage criterion, which determines the distance or similarity between clusters. The most commonly used linkage criteria are as follows:
Ward: This minimizes the variance of the distances between the clusters being merged.
Complete: This maximizes the distance between the closest points of the clusters being merged.
Average: This uses the average distance between all pairs of points in the two clusters being merged.
The choice of linkage criterion can have a ...