# Introduction to Fuzzy C-Means

Learn fuzzy c-means clustering algorithms.

In k-means, the boundaries between clusters are hard or crisp (that is, a data point either belongs to a cluster or not). Fuzzy c-means, also known as fuzzy k-means, extend k-means by fuzzifying the boundaries, that is, allowing data points to simultaneously belong to many clusters with varying degrees of membership. As such, a data point, for example, may have 40% membership with Cluster 1 and 60% membership with Cluster 2. Such clusters are called soft or fuzzy, as there are no concrete boundary lines between them.

## Assumptions and target cluster type

The key extension of fuzzy c-means over k-means is that it utilizes fuzzy clusters instead of crisp ones. Its assumptions and implications are the same as those of k-means (that is, it still looks for spherical-shaped clusters).

## Method

We will adopt the description of this algorithm from Yen and Langari (1999) on Fuzzy logic. Similar to k-means, the inputs of FCM are:

- The input dataset, $D$
- The distance function $d(x, y)$ with
`x`

and`y`

being data points in D (optional). - The number of clusters, $K$
- A convergence threshold $ε$.
- The fuzziness parameter $m > 1$, which controls how fuzzy the clusters will be. The higher $m$ is, the fuzzier the clusters (more details follow).

Usually, $d$ is set to be the Euclidean distance function.

FCM follows the following steps mentioned below.

## Step 1: Initialization

Select K initial centroids, represented as mean vectors $c_1 , c_2 , . . . , c_K$ . These centroids can be selected randomly or with prior knowledge.

## Step 2: Cluster membership computation

For all clusters $c_j$ $( j = 1 \dots K)$, the memberships $μ_{c_j} (x)$ of each data point $x ∈ D$ are computed as follows:

$μ_{c_j}(x)=\frac {1} {\sum^K_{i=1} \bigg(\frac {d(x,c_j)} {d(x,c_i)}\bigg)^\frac{1}{m-1} }$

where

- $j$ is between $1$ and $K$
- $d(x, c_i)$ and $d(x, c_j)$ denote the distance between $x$ and clusters $c_i, c_j$

Note here that $m$ is the exponent fuzzy membership and is a parameter in the objective function, and $m > 1$ denotes the fuzziness of the cluster. As $m$ approaches $1$, the membership values degrade to either $0$ or $1$ (i.e., the membership to the closest cluster will approach $1$ and others will approach $0$).

In the above equation $μ (x)$ is proportional to $\frac{1}{d(x,c_j)^{\frac {1}{1-m}}}$ ,with

${\sum^K_{i=1} {d(x,c_i)}^\frac{1}{m-1} }$

acting as a normalization term.

When $m$ goes to infinity, the exponent approaches $0$. Thus, the term $d(x,c_j)^ {\frac {1}{1-m}}$ approaches $1$. This means it doesn’t matter which cluster is closer to $x$—all memberships will be almost the same. This is what extreme fuzziness looks like: every point has similar membership to every cluster, regardless of their proximity. On the other hand, as $m$ goes to $1$, the term grows exponentially with the inverse of the distance toward infinity. Through normalization, the distance to the closest cluster will greatly outweigh the rest, thus making for very crisp boundaries.

## Step 3: Centroid update

Cluster centroids are recomputed as weighted means of their members, taking into account the membership $w$:

$c_j=\frac{\sum_{x∈D}(μ_{C_j} (x))^mx}{{\sum_{x∈D}(μ_{C_j} (x))^m}}$

where

- $x$ are points in dataset $D$,
- $c_j$ is the centroid of cluster
`j`

, and - $μ_{c_j} (x) is the membership degree of $x$ in cluster $j$.

## Step 4: Cluster membership re-computation

The cluster membership values of each data point in D are recomputed after the centroid update using the cluster membership equation.

## Step 5: Stability checking and termination

$T=\sum_{i=1}^C |c_j^{Previous} -c_i| ≤ ε$

This equation checks whether the difference between the centroid of a cluster over all clusters doesn’t change more than a specific threshold $ε$ set as an input. If it’s below that threshold, then it’s deemed converged and the algorithm terminates. Otherwise, go back to Step 3.

## Hyperparameter tuning

The key parameter to tune for fuzzy c-means is the number of clusters $K$. While the fuzziness parameter m could be a target for tuning, it usually suffices to fix $m = 2$, unless there’s some domain knowledge to set it otherwise. The distance metric $d$ is often set to Euclidean distance.

## A practical example of applying FCM

In the lab Fuzzy c-means of the chapter, we show how to use FCM to cluster the player data in the `DOTAlicous`

dataset. The usage of FCM here is almost identical to k-means, except with a different function: `fanny`

instead of `k-means`

. In first, we also clean and normalize the data. Then, we will call the fanny method. Please follow the second lab to understand the practical aspect of applying the appropriate methods.

After applying fuzzy c-means, we can inspect the memberships by printing the membership degrees for each centroid. The following table shows the result, where each row corresponds to one data point labeled by its unique row name (B, B.1, N, B.2, etc.), and each column indicates the membership to the corresponding centroids (that is, centroid 1, 2, 3, or 4). The membership values on each row should sum up to 1.

Get hands-on with 1200+ tech skills courses.