DBSCAN Walk-Through Example
Explore the DBSCAN clustering algorithm through a detailed walk-through using a small dataset. Understand key concepts such as neighborhood points, core, border, and noise point classification, and how clusters are assigned based on density connectivity. This lesson helps you implement DBSCAN from scratch and compare it with scikit-learn's DBSCAN, building a strong foundation in density-based clustering techniques.
In this lesson, we will execute a step-by-step walk-through of the Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm. We will use a small dataset and manually apply the key concepts like neighborhood identification, point type classification, and cluster assignment, to see how DBSCAN partitions data based on density, unlike K-means which relies on centroids.
Dry running an example
Let’s run DBSCANS on the following dataset shown in the graph:
Distance calculation
First, let’s choose the value of eps and minPts.
eps = 1.2
minPts = 2
Simple enough, let’s calculate the distance of all the points from each other. For the sake of ease, we’re going to calculate for
Euclidean Distance From (2, 1.5)
Data Points | Euclidean Distance From (2, 1.5) |
1, 2 | 1.118033989 |
2, 1 | 0.5 |
2, 1.5 | 0 |
2.5, 3.5 | 2.061552813 |
3, 4 | 2.692582404 |
4, 3.5 | 2.828427125 |
4, 7.5 | 6.32455532 |
5, 6 | 5.408326913 |
5, 7 | 6.264982043 |
5.5, 2 | 3.535533906 |
6, 1.5 | 4 |
6, 3 | 4.272001873 |
6, 5.5 | 5.656854249 |
6.5, 5 | 5.700877125 |
7, 2.5 | 5.099019514 |
1,8 | 6.57647322 |
Enter any data point from the above table as input using the format Px,Py to the following program to get its Euclidean distance from all the data points:
Identification of neighborhood points
Once we compute the Euclidean distance between the data point and all other data points, we can identify which data points have a distance less than or equal to our threshold value eps. In the table above, we can see that two data points, and ...