Divergence Measures

Here we’ll focus on a quite important aspect of statistical learning. This lesson is advanced and can be reasonably skipped if needed.


We can easily compare a couple of scalar values by their difference or a ratio. Similarly, we can compare the two vectors by taking the L1 or L2 norm.

To extend this notion of divergence between a couple of distributions requires some better measures, though. There are several real-world applications where we need to find the similarity (or difference) between two distributions. For example, text comparison between two sequences in bioinformatics, text comparison in Natural Language Processing (NLP), comparison of generated images by Generative Adversarial Networks (GANs), and so on.


Let’s begin with the fundamental measure. The entropy of an independent vector is defined as:

H(X)=i=1nP(Xi)logP(Xi)H(X) = -\sum_{i=1}^n P(X_i) logP(X_i)

Usually, the base of the log is taken as either 22 or ee.

Relative entropy

The relative entropy between two vectors XX and YY is defined as:

D(X,Y)=i=1nXilog(XiYi)D(X,Y) = -\sum_{i=1}^n X_i log(\frac{X_i}{Y_i})

Since the equation involves (element-wise) ratio as well as logarithm, we must make sure to check for the zeros.

Create a free account to view this lesson.

By signing up, you agree to Educative's Terms of Service and Privacy Policy