Trusted answers to developer questions
Trusted Answers to Developer Questions

Related Tags

machine learning

# What are Gaussian mixture models?

Esha Fatima

Grokking Modern System Design Interview for Engineers & Managers

Ace your System Design Interview and take your career to the next level. Learn to handle the design of applications like Netflix, Quora, Facebook, Uber, and many more in a 45-min interview. Learn the RESHADED framework for architecting web-scale applications by determining requirements, constraints, and assumptions before diving into a step-by-step design process.

### Intuition

Gaussian distribution can be used to model the probabilistic framework of a random, continuous variable. In the case of a classical machine learning problem, our ultimate goal is to train a model so that it can predict the most likely outcome, given the closest probability distribution function approximated from our training dataset.

Similarly, Gaussian mixture models emerge from the underlying inductive bias that the dataset can be produced entirely from a combination of finite Gaussian distributions.

An example of Gaussian distributions classifying training data instances

### Differences between Gaussian mixture model and K-means clustering

K-means clustering is a another unsupervised technique that attempts to hard-classify all the data points in the training set. In other words, each data point will be classified as belonging to one cluster or another. No data point will be a member of more than one cluster in the data set.

Hard clustering using K-means clustering

On the other hand, Gaussian mixture models do not rigidly classify each and every instance into one class or the other. The algorithm attempts to produce K-Gaussian distributions that would take into account the entire training space. Every point can be associated with one or more distributions. Consequently, the deterministic factor would be the probability that each point belongs to a certain Gaussian distribution. The distribution with the greatest likelihood would win.

Furthermore, Gaussian distribution automatically takes into account any missing data points, while K-means clustering depends on a dataset containing a proportionally rich and varied mix of data points.

### Applications of Gaussian mixture models

Following are a few major real-life applications of Gaussian mixture models:

• Gaussian mixture models are useful in marketing and customer targeting. They're used to segment different types of consumers to tailor marketing strategies to target each segment accordingly.
• Gaussian mixture models are also useful in financial stock markets. Gaussian marketing models effectively forecast trends based on existing patterns and help investors avoid the high amount of noise present in stock market trends.

### Derivation

The Gaussian probability density function can be modeled as follows, where $X$is the training dataset,$D$is the number of dimensions of each data point, $\mu$is the mean, and $\sigma^2$is the co-variance.

The probability,$z_i$, that a given point$x$ belongs to the $i^{th}$Gaussian distribution can be expressed as follows:

The mixing co-efficient of the $i^{th}$Gaussian distribution, $\pi_i$, describes the probability that any certain point in a mixture of multiple Gaussian distributions belongs to the $i^{th}$Gaussian distribution.

Given that our algorithm is a mixture of $k$ different Gaussian mixture models and that each Gaussian mixture model is independent of the other, the total probability distribution can be given as follows:

Given that we now have the probability distribution$z$, the probability that some data point $x_i$ stems from Gaussian $k$ can be given as follows:

Hence, to find out what $P(x_i,z)$is, we need to multiply $P(x_i|z)$ with $P(z)$as per Baye's rule.

If we add up$P(x_i,z)$over all the Gaussian distributions that we assume are present in the training dataset, it will lead to marginalization and we'll get the probability of $P(x_i)$, as follows:

Hence, it follows that for all $n$observations, the probability would be$P(X)$, provided that $X = \{{x_1,x_2,x_3...x_n}\}$

Taking a natural logarithm on both sides of the equation would preserve the concavity of the objective function, so we can take the derivative and find the cluster with the greatest likelihood.

Applying Baye's rule and given that $P(z_k= 1) = \pi_k$and that$P(x_i|z_k) = \mathcal{N}(x_i|\mu_k,\sigma^2_k)$, the term representing $P(z_k = 1|x_i)$can be written as follows:

### Implementation of Gaussian mixture models in SK-Learn

Following is a code snippet that classifies a randomly generated dataset into clusters using Gaussian mixture models from the SK-learn package of Python.

import matplotlib.pyplot as pltfrom sklearn.datasets import make_blobsfrom sklearn import mixturefeatures, true_values = make_blobs(n_samples=800, centers=4, cluster_std=0.50, random_state=0)gmm_model = mixture.GaussianMixture(n_components=4).fit(features)labels = gmm_model.predict(features)plt.scatter(features[:, 0], features[:, 1], c=labels, s=40, cmap='viridis');plt.savefig('./output/svsrelu.png', dpi = 500)plt.show()

### Explanation

• Lines 1–3: We make the necessary imports for making the dataset and plotting the graph.
• Line 4: We generate Gaussian distributions. The value n_samples indicates the number of data samples in the training data, centers is used to specify the number of Gaussian distributions, cluster_std is used to specify the standard deviation of clusters, and random_state is used to ensure that we get the same results upon multiple executions of the code despite randomization.
• Line 5: We train the created instance of the Gaussian mixture models to fit the training dataset.
• Line 6: We predict values for the testing data samples using the trained Gaussian mixture model.
• Lines 7–9: We plot the graph.

### Conclusion

On the whole, Gaussian mixture models are a refined form of the original models (k-NNs and K-means clustering). They consider the closeness between points belonging to the same class as an associated measure of data points. Defining distance in terms of probability and eradicating the need for each point to belong to only one cluster adds to the learning capabilities of the model and allows for non-binary trends to be better addressed.

RELATED TAGS

machine learning

CONTRIBUTOR

Esha Fatima 