What are Gaussian mixture models?

Intuition

Gaussian distribution can be used to model the probabilistic framework of a random, continuous variable. In the case of a classical machine learning problem, our ultimate goal is to train a model so that it can predict the most likely outcome, given the closest probability distribution function approximated from our training dataset.

Similarly, Gaussian mixture models emerge from the underlying inductive bias that the dataset can be produced entirely from a combination of finite Gaussian distributions.

On the other hand, Gaussian mixture models do not rigidly classify each and every instance into one class or the other. The algorithm attempts to produce K-Gaussian distributions that would take into account the entire training space. Every point can be associated with one or more distributions. Consequently, the deterministic factor would be the probability that each point belongs to a certain Gaussian distribution. The distribution with the greatest likelihood would win.

Furthermore, Gaussian distribution automatically takes into account any missing data points, while K-means clustering depends on a dataset containing a proportionally rich and varied mix of data points.

Applications of Gaussian mixture models

Following are a few major real-life applications of Gaussian mixture models:

Gaussian mixture models are useful in marketing and customer targeting. They're used to segment different types of consumers to tailor marketing strategies to target each segment accordingly.
Gaussian mixture models are also useful in financial stock markets. Gaussian marketing models effectively forecast trends based on existing patterns and help investors avoid the high amount of noise present in stock market trends.

Derivation

The Gaussian probability density function can be modeled as follows, where $X$ is the training dataset, $D$ is the number of dimensions of each data point, $\mu$ is the mean, and $\sigma^2$ is the co-variance.

Explanation

Lines 1–3: We make the necessary imports for making the dataset and plotting the graph.
Line 4: We generate Gaussian distributions. The value n_samples indicates the number of data samples in the training data, centers is used to specify the number of Gaussian distributions, cluster_std is used to specify the standard deviation of clusters, and random_state is used to ensure that we get the same results upon multiple executions of the code despite randomization.
Line 5: We train the created instance of the Gaussian mixture models to fit the training dataset.
Line 6: We predict values for the testing data samples using the trained Gaussian mixture model.
Lines 7–9: We plot the graph.

Conclusion

On the whole, Gaussian mixture models are a refined form of the original models (k-NNs and K-means clustering). They consider the closeness between points belonging to the same class as an associated measure of data points. Defining distance in terms of probability and eradicating the need for each point to belong to only one cluster adds to the learning capabilities of the model and allows for non-binary trends to be better addressed.