# DBSCAN

Learn about DBSCAN clustering, density, dense regions, point types, and algorithm.

We'll cover the following

After learning the famous $k$-means clustering algorithm, a type of partitional clustering, we’ll move towards a more robust approach still used in the industry. This approach is called density-based clustering.

## DBSCAN clustering

DBSCAN stands for Density-Based Spatial Clustering of Applications with Noise. In density-based clustering, the set of data points $D$ is partitioned into dense regions separated by regions of low density. DBSCAN is one popular density-based clustering algorithm. It’s a robust algorithm that doesn’t require prior knowledge of the number of clusters in the data, unlike the $k$-means algorithm.

### How to define density

Density is defined by two parameters, epsilon ($\epsilon$) and minPoints ($m$), which quantify density for individual points and a set of points.

Epsilon specifies the maximum distance between two points to be considered neighbors. If the distance between two points is less than or equal to $\epsilon$, they’re considered to be in each other’s neighborhood. MinPoints specifies the minimum number of neighbors a point must have within the $\epsilon$ distance to be considered a core point. Any point with fewer than $m$ neighbors within the $\epsilon$ distance is considered a border or noise point.

#### Density at a point

The density at any data point $\bold x$ is defined as the number of data points in $D$ within a circle of radius $\epsilon$ centered at $\bold x$. The code below computes the density of a point x given the array of points D and the radius eps using Euclidean distance as dissimilarity. Density at a point $C = (-0.80536652, -1.72766949)$ with a radius of $3.2$ is shown below:

Get hands-on with 1200+ tech skills courses.