K-Means Clustering
Explore the K-means clustering algorithm by understanding how it groups data into clusters using similarity and distance measures like cosine similarity and Euclidean distance. Learn the iterative steps involved, challenges such as selecting the number of clusters and sensitivity to initial centroid placement, and alternatives like K-means++. This lesson equips you to apply clustering techniques effectively in your data science projects.
We'll cover the following...
Clustering
Clustering is a famous unsupervised learning technique. It involves making clusters or groups of items such that the items in the same cluster are more similar to each other than items in the other cluster. In this lesson, we will be looking into K-means clustering.
K-means clustering
K-Means clustering as the name suggests, looks for a fixed number of clusters () in the dataset. The mean or center of the cluster is represented by , which is also called the cluster centroid or average point. K-means relies on the idea of similarity/dissimilarity when assigning instances to respective clusters.
Similarity can also be thought of as proximity. It is a numerical measure of how alike two data instances are. Cosine Similarity is one of the most commonly used similarity measures. It takes values between 0 and 1, where a higher value indicates more similar instances. Cosine similarity between two feature vectors of and ...