DBSCAN

Learn about DBSCAN clustering, density, dense regions, point types, and algorithm.

After learning the famous kk-means clustering algorithm, a type of partitional clustering, we’ll move towards a more robust approach still used in the industry. This approach is called density-based clustering.

DBSCAN clustering

DBSCAN stands for Density-Based Spatial Clustering of Applications with Noise. In density-based clustering, the set of data points DD is partitioned into dense regions separated by regions of low density. DBSCAN is one popular density-based clustering algorithm. It’s a robust algorithm that doesn’t require prior knowledge of the number of clusters in the data, unlike the kk-means algorithm.

How to define density

Density is defined by two parameters, epsilon (ϵ\epsilon) and minPoints (mm), which quantify density for individual points and a set of points.

Epsilon specifies the maximum distance between two points to be considered neighbors. If the distance between two points is less than or equal to ϵ\epsilon, they’re considered to be in each other’s neighborhood. MinPoints specifies the minimum number of neighbors a point must have within the ϵ\epsilon distance to be considered a core point. Any point with fewer than mm neighbors within the ϵ\epsilon distance is considered a border or noise point.

Density at a point

The density at any data point x\bold x is defined as the number of data points in DD within a circle of radius ϵ\epsilon centered at x\bold x. The code below computes the density of a point x given the array of points D and the radius eps using Euclidean distance as dissimilarity. Density at a point C=(0.80536652,1.72766949)C = (-0.80536652, -1.72766949) with a radius of 3.23.2 is shown below:

Get hands-on with 1200+ tech skills courses.