Search⌘ K
AI Features

DBSCAN

Explore the DBSCAN clustering algorithm to understand how it groups data points based on density. Learn about core, border, and noise points, how clusters form through density connectivity, and why DBSCAN is effective for arbitrarily shaped clusters and noisy data. This lesson helps you grasp density-based clustering and implement DBSCAN for practical unsupervised learning tasks.

After exploring K-means, a form of partitional clustering that relies on distance to a central point, we now move to density-based clustering. This approach groups together data points that are closely packed (high density) while marking points that lie alone in low-density regions as outliers. The most popular algorithm in this category is DBSCAN (Density-Based Spatial Clustering of Applications with Noise). A key advantage of DBSCAN is that it does not require prior knowledge of the number of clusters and is robust against noise.

DBSCAN clustering

In density-based clustering, the dataset is partitioned into dense regions separated by areas of low density. Density is quantified using two crucial hyperparameters:

  • Epsilon (ϵ\epsilon): Specifies the maximum distance between two points to be considered neighbors. If the distance between two points is ϵ\le \epsilon, they are considered to be in each other’s neighborhood.
  • MinPoints (mm): Specifies the minimum number of neighbors a point must have (including itself) within the ϵ\epsilon distance to be considered a core point.

Density at a point

The density at any data point x\mathbf{x} is defined as the number of data points in the dataset DD within a circle of radius ϵ\epsilon centered at x\mathbf{x}.

The image below illustrates the concept of density by showing the points enclosed within a circle of a specified radius centered at point CC ...