Search⌘ K

Divergence Measures

Discover how to measure differences between probability distributions using divergence measures such as entropy, relative entropy, Kullback-Leibler divergence, Jensen-Shannon divergence, total variation distance, and Wasserstein distance. Learn their definitions, applications, and significance in machine learning contexts, including GANs and NLP.

Here we’ll focus on a quite important aspect of statistical learning. This lesson is advanced and can be reasonably skipped if needed.

Introduction

We can easily compare a couple of scalar values by their difference or a ratio. Similarly, we can compare the two vectors by taking the L1 or L2 norm.

To extend this notion of divergence between a couple of distributions requires some better measures, though. There are several real-world applications where we need to find the similarity (or difference) between two distributions. For example, text comparison between two sequences in bioinformatics, text comparison in Natural Language Processing (NLP), comparison of generated images by Generative Adversarial Networks (GANs), and so on.

Entropy

Let’s begin with the fundamental measure. The entropy of an independent vector is defined as:

H(X)=i=1nP(Xi)logP(Xi)H(X) = -\sum_{i=1}^n P(X_i) logP(X_i)

Usually, the base of the log is taken as either 22 or ee.

Relative entropy

The relative entropy between two vectors XX and YY is defined as:

D(X,Y)=i=1nXilog(XiYi)D(X,Y) = -\sum_{i=1}^n X_i log(\frac{X_i}{Y_i}) ...