Building the Model

In clustering tasks, learn to create the machine learning model. Further, this lesson covers plotting elbow and PCA plots for the model and saving the machine learning model.

Creating a model

The create_model() function lets us easily create and evaluate the clustering model of our preference such as the k-means algorithm. This function creates 44 clusters by default. We can set the num_clusters parameter to 33 because this is the correct number. Instead of doing that, however, we’ll follow an approach that generalizes for real-world datasets where the cluster number is typically unknown. After executing the function, we print several performance metrics such as silhouette, Calinski-Harabasz, and Davies-Bouldin. We’ll focus on the silhouette coefficient defined in the following equation.

s(i)=b(i)a(i)max{a(i),b(i)}s(i)=\frac{b(i)-a(i)}{\max \{a(i), b(i)\}}

1s(i)1-1 \leq s(i) \leq 1

  • s(i)s(i) is the silhouette coefficient of the dataset instance ii.
  • a(i)a(i) is the mean intra-cluster distance of ii.
  • b(i)b(i) is the mean nearest-cluster distance of ii.

Get hands-on with 1200+ tech skills courses.