...

Hierarchical Clustering

Learn all about hierarchical clustering and how to cluster data with it using scikit-learn.

We'll cover the following...

Agglomerative clustering
Divisive clustering
The scikit-learn implementation
Limitations
Conclusion

Hierarchical clustering is another popular unsupervised clustering algorithm that groups data points into clusters based on similarity. It works by building a hierarchy of clusters, starting with individual data points and gradually merging them into larger clusters.

There are two types of hierarchical clustering: agglomerative and divisive.

Agglomerative clustering

Agglomerative clustering is a hierarchical clustering algorithm that groups data points based on their pairwise distances or similarities. Unlike k-means or DBSCAN, agglomerative clustering doesn’t require specifying the number of clusters in advance. Instead, it builds a hierarchy of clusters by iteratively merging the most similar or nearby data points or clusters.

The algorithm starts by considering each data point as a separate cluster. It then repeatedly merges the two closest clusters based on a chosen linkage criterion, which determines the distance or similarity between clusters. The most commonly used linkage criteria are as follows:

Ward: This minimizes the variance of the distances between the clusters being merged.
Complete: This maximizes the distance between the closest points of the clusters being merged.
Average: This uses the average distance between all pairs of points in the two clusters being merged.

The choice of linkage criterion can have a significant impact on the clustering results, as it affects the shape and structure of the clusters.

Agglomerative clustering continues merging clusters until all data points are grouped into a ...

Course Overview

Introduction to Machine Learning

Preprocessing

Supervised Learning

Unsupervised Learning

Model Evaluation

How to Predict the Traffic Volume Using Machine Learning

Tips and Tricks

Conclusion

Customer Segmentation with K-Means Clustering

Hierarchical Clustering

Agglomerative clustering