What is KL divergence?
Introduction
Often in real-world applications, we need some measure to compare probability distribution. Normal metrics are not useful in such cases and thus we need some other useful measure.
Divergence measures are the measures that are normally used for this task. Kullback-Leibler (KL) divergence is the most commonly used divergence measure.
KL divergence is a way of measuring the deviation between two probability distributions. In the case of discrete distributions, KL is defined as:
And for continuous distributions:
Where
The symbol
Note: Some authors also use the term relative entropy for this, but here we'll follow the practice followed in Convex Optimization by Stephen Boyd and Lieven Vandenberghe, Cambridge University Press, 2004.
Difference between divergence and metric
Even though KL divergence can give us a way to measure the distance between two probability distributions, it is not a metric. The differences between divergence and metrics are:
- Metrics are symmetric, whereas divergence is asymmetric.
- Metrics satisfy the triangle inequality, that is,
where is the distance function. If any two of and are equal, then the inequality holds.
Applications
Data compression
Shortcodes replace the most frequently appearing words when compressing a data file such as .txt. A well-known probability distribution can make a lot of work more manageable. KL divergence is a good method to compare true probability with well-known distributions.
Variational inference
Variational inference is an optimization problem where we can use KL divergence as an approximation method to find how an intractable distribution is close to a tractable distribution.
Variational autoencoders
The above-mentioned use of KL divergence makes it a perfect loss function in Variational autoencoders where we need to
Implementation
We'll implement KL divergence using NumPy in the code below:
import numpy as npdef kl_divergence(p, q):"""Parameters:p: true distributionq: known distribution---------------------Returns:KL divergence of p and q distributions."""return np.sum(np.where(p != 0, p * np.log(p / q) - p + q, 0))p = np.asarray([0.4629, 0.2515, 0.9685])q = np.asarray([0.1282, 0.8687, 0.4996])print("KL divergence between p and q is: ", kl_divergence(p, q)) # this should return 0.7372678653853546print("KL divergence between p and p is: ", kl_divergence(p, p)) # this should return 0
Code explanation
- Line 12: This line implements the following equation of KL divergence:
Note: KL divergence between the same probability distributions is 0
Free Resources