Trusted answers to developer questions

Esha Fatima

Grokking Modern System Design Interview for Engineers & Managers

Ace your System Design Interview and take your career to the next level. Learn to handle the design of applications like Netflix, Quora, Facebook, Uber, and many more in a 45-min interview. Learn the RESHADED framework for architecting web-scale applications by determining requirements, constraints, and assumptions before diving into a step-by-step design process.

Gaussian distribution can be used to model the probabilistic framework of a random, continuous variable. In the case of a classical machine learning problem, our ultimate goal is to train a model so that it can predict the most likely outcome, given the closest probability distribution function approximated from our training dataset.

Similarly, **Gaussian mixture models** emerge from the underlying inductive bias that the dataset can be produced entirely from a combination of finite Gaussian distributions.

K-means clustering is a another unsupervised technique that attempts to hard-classify all the data points in the training set. In other words, each data point will be classified as belonging to one cluster or another. No data point will be a member of more than one cluster in the data set.

On the other hand, Gaussian mixture models do not rigidly classify each and every instance into one class or the other. The algorithm attempts to produce K-Gaussian distributions that would take into account the entire training space. Every point can be associated with one or more distributions. Consequently, the deterministic factor would be the probability that each point belongs to a certain Gaussian distribution. The distribution with the greatest likelihood would win.

Furthermore, Gaussian distribution automatically takes into account any missing data points, while K-means clustering depends on a dataset containing a proportionally rich and varied mix of data points.

Following are a few major real-life applications of Gaussian mixture models:

- Gaussian mixture models are useful in marketing and customer targeting. They're used to segment different types of consumers to tailor marketing strategies to target each segment accordingly.
- Gaussian mixture models are also useful in financial stock markets. Gaussian marketing models effectively forecast trends based on existing patterns and help investors avoid the high amount of noise present in stock market trends.

The Gaussian probability density function can be modeled as follows, where

The probability,

The mixing co-efficient of the

Given that our algorithm is a mixture of

Given that we now have the probability distribution

Hence, to find out what

If we add up

Hence, it follows that for all

Taking a natural logarithm on both sides of the equation would preserve the concavity of the objective function, so we can take the derivative and find the cluster with the greatest likelihood.

Applying Baye's rule and given that

import matplotlib.pyplot as pltfrom sklearn.datasets import make_blobsfrom sklearn import mixturefeatures, true_values = make_blobs(n_samples=800, centers=4, cluster_std=0.50, random_state=0)gmm_model = mixture.GaussianMixture(n_components=4).fit(features)labels = gmm_model.predict(features)plt.scatter(features[:, 0], features[:, 1], c=labels, s=40, cmap='viridis');plt.savefig('./output/svsrelu.png', dpi = 500)plt.show()

- Lines 1–3: We make the necessary imports for making the dataset and plotting the graph.
- Line 4: We generate Gaussian distributions. The value
`n_samples`

indicates the number of data samples in the training data,`centers`

is used to specify the number of Gaussian distributions,`cluster_std`

is used to specify the standard deviation of clusters, and`random_state`

is used to ensure that we get the same results upon multiple executions of the code despite randomization. - Line 5: We train the created instance of the Gaussian mixture models to fit the training dataset.
- Line 6: We predict values for the testing data samples using the trained Gaussian mixture model.
- Lines 7–9: We plot the graph.

On the whole, Gaussian mixture models are a refined form of the original models (k-NNs and K-means clustering). They consider the closeness between points belonging to the same class as an associated measure of data points. Defining distance in terms of probability and eradicating the need for each point to belong to only one cluster adds to the learning capabilities of the model and allows for non-binary trends to be better addressed.

RELATED TAGS

machine learning

CONTRIBUTOR

Esha Fatima

Copyright ©2022 Educative, Inc. All rights reserved

Grokking Modern System Design Interview for Engineers & Managers

Ace your System Design Interview and take your career to the next level. Learn to handle the design of applications like Netflix, Quora, Facebook, Uber, and many more in a 45-min interview. Learn the RESHADED framework for architecting web-scale applications by determining requirements, constraints, and assumptions before diving into a step-by-step design process.

Keep Exploring

Related Courses