K-Means Clustering

Explore interview prep questions centered around k-means clustering for unsupervised learning.

K-means clustering is a fundamental tool in unsupervised learning for grouping similar data points without prior labels. In this lesson, we'll practice implementing the algorithm step-by-step, understand how to form meaningful clusters, and evaluate clustering performance with silhouette scores. Let’s get started.

Implementing k-means clustering

You are given a dataset containing various data points representing customer transactions. Your task is to group these transactions into different clusters based on their similarities using the k-means clustering algorithm. The dataset is represented as a list of tuples, where each tuple contains transaction details like the amount spent and the number of items purchased.

Implement a function k_means_clustering(data, k) that clusters the given dataset into k clusters using the k-means algorithm. The function should return a list of clusters, where each cluster is a list of transaction points.

This question is frequently asked in ML engineer interviews involving recommendation systems or behavioral analysis.

Press + to interact
Python 3.10.4
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score
import numpy as np
import random
def initialize_data():
"""
Initialize data for K-means clustering.
Returns:
- data: A list of 2D points (tuples of x, y coordinates)
- k: Number of clusters to create
"""
# Set a random seed for reproducibility
random.seed(42)
# Generate a list of random 2D points
# This example creates 100 points between 0 and 10
data = [(random.uniform(0, 10), random.uniform(0, 10)) for _ in range(100)]
# Choose the number of clusters (k)
k = 4
return data, k
def k_means_clustering(data, k):
#TODO - your implementation here
#Return clusters
return clusters
# Initializing the inputs
data, k = initialize_data()
#Running the clustering
output = k_means_clustering(data, k)
print(f"Output: {output}")

Sample answer

Here’s a plan that we can follow to implement our solution: ...