K-Means Clustering
Learn about the k-means algorithm, its initialization, NP-hardness, and variance computation with examples.
Traditionally, in machine learning, we start with the popular partitional clustering algorithm called -means clustering. This algorithm divides the data into clusters based on a similarity score. The objective is to minimize the total variance of the clusters. The number of clusters, , must be specified.
Note: The choice of similarity metric is a hyperparameter.
Objective
K-means clustering aims to partition a dataset into clusters such that the total variance of the clusters is minimized. This means we want the data points within each cluster to be as close to each other as possible.
Given a set of data points in (a -dimensional Euclidean space), the goal is to partition into