In data mining,
We use randomization to find initial centroids when using Lloyd's
We can utilize the following two approaches to get around the initialization sensitivity concern:
Repeat
However,
The exact algorithm is as follows:
Choose the first centroid at random from the data points.
Compute the distance between each data point and the nearest, previously chosen centroid.
Choose the next centroid from the data points so that the chance of selecting a point as a centroid is directly proportional to its distance from the previous centroid.
Repeat the second and the third step until all
Once the initial centers are determined, proceed with ordinary
Let's consider the following example. Suppose we want to make two clusters, and we have the following points:
The initial step is to choose a data point at random to serve as the cluster centroid:
Assume the red point is chosen as the initial centroid. Now compute the distance between each data point and this centroid:
The next centroid is the one with the greatest squared distance from the present centroid:
The blue point will be chosen as the next centroid in this case. After initializing the centroids, we can proceed with the
Free Resources