# K-Means Clustering

Learn about the k-means algorithm, its initialization, NP-hardness, and variance computation with examples.

We'll cover the following

Traditionally, in machine learning, we start with the popular partitional clustering algorithm called $k$-means clustering. This algorithm divides the data into $k$ clusters based on a similarity score. The objective is to minimize the total variance of the $k$ clusters. The number of clusters, $k$, must be specified.

Note: The choice of similarity score is a hyperparameter.

## Objective

Given a set $D=\{\bold x_1, \bold x_2, \dots, \bold x_n \}$ of $n$ data points in $\R^d$, the goal is to partition $D$ into the given $k \ge 2$ partitions, say, $C_1, C_2, \dots, C_k$ such that $\sum_{j=1}^K var(C_j)$ is minimum. The $var(C_j)$ using Euclidean distance can be defined as follows:

$var(C_j) = \sum_{\bold x \in C_j}\|\bold x-\bold \mu_j\|_2^2$

Here, $\bold \mu_j$ is the centroidThe centroid of a set of points is the mean of the data points in the set. of the the partition $C_j$.

Note: The variance of a partition $C_j$ is defined in terms of Euclidean distance here; however, other distance/similarity measures can also be used.

### Variance computation code example

Let’s compute the variance of a set of points using numpy:

Get hands-on with 1200+ tech skills courses.