DBSCAN
Explore DBSCAN, an unsupervised clustering algorithm that groups data points based on density. Learn how it detects core, border, and noise points without requiring the number of clusters in advance. Understand DBSCAN's advantages over K-means, including handling arbitrarily shaped clusters and noise effectively.
After exploring K-means, a form of partitional clustering that relies on distance to a central point, we now move to density-based clustering. This approach groups together data points that are closely packed (high density) while marking points that lie alone in low-density regions as outliers. The most popular algorithm in this category is DBSCAN (Density-Based Spatial Clustering of Applications with Noise). A key advantage of DBSCAN is that it does not require prior knowledge of the number of clusters and is robust against noise.
DBSCAN clustering
In density-based clustering, the dataset is partitioned into dense regions separated by areas of low density. Density is quantified using two crucial hyperparameters:
- Epsilon (): Specifies the maximum distance between two points to be considered neighbors. If the distance between two points is , they are considered to be in each other’s neighborhood.
- MinPoints (): Specifies the minimum number of neighbors a point must have (including itself) within the distance to be considered a core point.
Density at a point
The density at any data point is defined as the number of data points in the dataset within a circle of radius centered at .
The image below illustrates the concept of density by showing the points enclosed within a circle of a specified radius centered at point ...