Challenges in Applying the Clustering Process

Learn the challenges in the clustering process.

We'll cover the following

Cluster definition
Similarity definition
Cluster tendency
Outliers

When applying the clusters, users usually face some key challenges.

Cluster definition

First, how do we define a suitable cluster given a dataset? This is problematic because, at this stage, we’re exploring the data, and thus, we may not necessarily know what a cluster looks like or how the points can be grouped together. When we discuss the clustering algorithms below, we’ll see that each algorithm imposes certain assumptions on the kinds of clusters it is looking for. For example, k-means is optimized to search for clusters that are in the form of convex, blob-like shapes. Accepting such assumptions when applying a specific clustering algorithm means that we may not find all clusters existing in the data, or worse, clusters discovered may be wrong.

Specifically, clusters may take vastly different forms in real life, as illustrated in the below figure. Besides being blob-like, round, or elliptical, clusters may also be elongated entities that encompass one another. Note that there’s no single clustering algorithm able to detect all these kinds of clusters. Therefore, knowing beforehand the types of clusters we are looking for is critical to the success of our analyses. To overcome this issue, we can apply different clustering algorithms to the same dataset to ensure that we are not missing out on any particular type of cluster.

Get hands-on with 1400+ tech skills courses.

Getting Started

Introduction to Game Data Science

Data Preprocessing

Introduction to Statistics and Probability Theory

Data Abstraction

Data Analysis through Visualization

Clustering Methods in Game Data Science

Supervised Learning in Game Data Science

Model Validation and Evaluation

Introduction to Neural Networks

Sequence Analysis of Game Data

Advanced Sequence Analysis

Case Study: Tom Clancy's The Division (TCTD)

Conclusion and Remarks

Appendix A: Game Used in the Book

Challenges in Applying the Clustering Process

Cluster definition