GANs and the Birthday Paradox
Explore the evaluation of GANs using the birthday paradox and image similarity measures.
We'll cover the following
One of the biggest challenges in evaluating GAN samples is to understand how much of the real distribution the generator has learned. For example, let’s consider the size of the support for the set of all the possible images of dogs. Naturally, this set must include millions of dog images that portray combinations of all dog features, including size, breed, hair color, pose, and more.
Assuming there are millions of dogs in real life that we humans perceive as unique, a GAN that has truly learned the distribution of dogs must be able to produce a similar number of unique dog images. Estimating the number of unique images of dogs a GAN is able to produce might seem like a daunting task at first, but researchers have found a brilliant crude estimate of this by using the birthday paradox.
The birthday paradox
The birthday paradox is commonly addressed in undergraduate classes where teachers ask students in the class what the probability is that two people in the class have the same birthday. After some speculation, students are normally dazzled to find out that even with only 23 people in a room, the chances that two of them have the same birthday is about 50%. With 23 people in a room, there are
The birthday paradox states that in a discrete distribution of support,
Implementation of the test
A simple yet efficient intervention is to use some image similarity measure to detect a collision and then, given a sample image, use the number of similar images to identify the size of the support of the distribution. Naturally, this method is dependent on the image similarity measure. Normally, two images are considered similar if the distances between them are within some epsilon. This epsilon parameter directly influences the number of collisions that will be found.
The birthday paradox test for GANs proposed by Sanjeev Aurora is as follows:
Pick a sample
of size , produced with the generator. Use some similarity measure to compute the similarity between the images in
. Flag the 20 most similar pairs in the sample.
Visually inspect the flagged images for near duplicates.
Repeat the process.
After running this experiment several times, we gather information about how likely it is to find near duplicates in the sample,
The birthday paradox works for GANs, especially under the assumption that the generator applies uniform probability to images in its distributions. In cases where the probability distribution is not uniform, the birthday paradox fails.
GAN face similarity analysis
The
Get hands-on with 1200+ tech skills courses.