KK-means algorithm

For a given dataset and value of kk, kk-means clustering has the following steps:

  1. Choose some value of kk such that k2k\ge2, if it’s not given already.

  2. Choose kk number of centroids, randomly.

  3. Find the similarity score of each data point with respect to each centroid.

  4. Based on the similarity score, assign each data point its centroid.

  5. From these new groupings, find new centroids by taking the mean of all data points of a cluster.

  6. Repeat steps 3 to 5 until the difference between old and new centroids is negligible.

If the steps above seem unclear, don’t worry. We’re going to show each step in an example with an illustration.

Dry running the example

Let’s say we have the following dataset:

Create a free account to view this lesson.

By signing up, you agree to Educative's Terms of Service and Privacy Policy