Customer Segmentation

Learn how to segment customer bases using k-means clustering.

There are a number of unsupervised clustering algorithms, but k-means is one of the easiest. It can segment an unlabeled dataset into a predetermined number of groups. The input parameter k stands for the number of clusters or groups we would like to form. However, if k is too small, then the centroids won’t lie within the clusters. But if k is too large, some of the clusters may be oversplit.

Implementing k-means clustering

The k-means algorithm follows these steps:

  1. Choose the number of clusters (k).

  2. Randomly assign centroids for each cluster.

  3. Assign each observation to a cluster for which the centroid is the closest based on the similarity or distance measures.

  4. Compute a new centroid for each cluster.

  5. Repeat steps 3 and 4 as long as the centroids keep changing.

Get hands-on with 1400+ tech skills courses.