DBSCAN Walk-Through Example
Explore the DBSCAN clustering process from scratch by learning how to identify neighborhood points, classify core, border, and noise points, and assign clusters using density-based criteria. This lesson provides hands-on implementation and visualization, enhancing your understanding of how DBSCAN partitions datasets and compares to other clustering techniques like K-means.
In this lesson, we will execute a step-by-step walk-through of the Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm. We will use a small dataset and manually apply the key concepts like neighborhood identification, point type classification, and cluster assignment, to see how DBSCAN partitions data based on density, unlike K-means which relies on centroids.
Dry running an example
Let’s run DBSCANS on the following dataset shown in the graph:
Distance calculation
First, let’s choose the value of eps and minPts.
eps = 1.2
minPts = 2
Simple enough, let’s calculate the distance of all the points from each other. For the sake of ease, we’re going to calculate for
Euclidean Distance From (2, 1.5)
Data Points | Euclidean Distance From (2, 1.5) |
1, 2 | 1.118033989 |
2, 1 | 0.5 |
2, 1.5 | 0 |
2.5, 3.5 | 2.061552813 |
3, 4 | 2.692582404 |
4, 3.5 | 2.828427125 |
4, 7.5 | 6.32455532 |
5, 6 | 5.408326913 |
5, 7 | 6.264982043 |
5.5, 2 | 3.535533906 |
6, 1.5 | 4 |
6, 3 | 4.272001873 |
6, 5.5 | 5.656854249 |
6.5, 5 | 5.700877125 |
7, 2.5 | 5.099019514 |
1,8 | 6.57647322 |
Enter any data point from the above table as input using the format Px,Py to the following program to get its Euclidean distance from all the data points:
Identification of neighborhood points
Once we compute the Euclidean distance between the data point and all other data points, we can identify which data points have a distance less than or equal to our threshold value eps. In the table above, we can see that two data points, ...