Hierarchical Clustering
Explore hierarchical clustering methods to organize data into nested groups, using agglomerative and divisive techniques. Understand how linkage criteria influence cluster merging and how dendrograms visualize the hierarchy. This lesson helps you apply clustering to summarize and analyze data structure effectively.
Hierarchical clustering
Hierarchical clustering involves creating a hierarchy of clusters. Representing data objects in the form of a hierarchy is useful for data summarization and visualization. For example, consider that we want to organize the people in an organization into major groups such as executives, managers, and staff. We can further partition the above into smaller sub-groups. This is the basic idea of hierarchical clustering.
Agglomerative hierarchical clustering
Agglomerative hierarchical clustering uses a bottom-up strategy. In this strategy, each instance is initially treated as a cluster. At each successive iteration of this clustering algorithm, a cluster is merged with other most similar clusters until only one large cluster is formed.
Basic steps
Agglomerative hierarchical clustering works as follows.
-
We compute the similarities between the instances and represent these in the form of a matrix, which is also called a similarity or proximity matrix.
-
Now we treat each instance as a single cluster.
-
We merge the closest two clusters and update the proximity matrix. Updating the matrix involves computing the cluster distance between the new merged cluster and every other cluster. This ensures that in the next iteration, the updated matrix is used and the merge operation is performed again using the updated matrix.
-
We keep on repeating step 3 until a condition is met or one cluster is formed. The condition might be the minimum clusters remaining once the merge operation has been performed.
Example
-
In the above diagram we treat each point ...