Similarity and Dissimilarity Measures
Learn key dissimilarity metrics, including Minkowski, Mahalanobis, and Geodesic distances, and when to use each in clustering.
Similarity or dissimilarity measures are core components of clustering algorithms that determine how data points are grouped: similar points are placed in the same clusters, while dissimilar or distant points are placed in different clusters. The choice of metric is crucial and depends heavily on the structure and scale of your data.
All measures discussed below involve two data points, and , in -dimensional space ().
Minkowski distance and -norms
The Minkowski distance is not just one distance metric, but a generalized formula that can define a whole family of distances. The first three common metrics—Euclidean, Manhattan, and Minkowski itself—are all simply special cases of this single formula. The generalization relies on the mathematical concept of the -norm.
The -norm
The -norm () is a way to calculate the magnitude (or length) of a vector in -dimensional space (). Think of it as a flexible ruler where the value of changes how the distance is measured.
Here, means that the parameter ...