What is Generative Adversarial Network (GAN)?

Overview of Generative Adversarial Networks

Generative Adversarial Networks (GANs) are one of the most widely used generative models. Generative models create new data instances that resemble the data points from the training data. In short, GANs help us create content it hasn’t seen before.

Given a training set, a GAN learns to generate new data with the same statistics as the training set. For example, a GAN trained on pictures of human faces can create new photographs of human faces, even though the new faces do not belong to any real person.

Components of GANs

GANs have the following components:

Generator
Discriminator

Generator

The generator leans to produce the target output. It starts with random noise and then creates images that look like original images from the dataset. These images will be later used in the training of the discriminator. Generators are typically deconvolutional neural networks.

Discriminator

As the name suggests, the discriminator acts like a critic to distinguish between the original images from the training dataset and the images generated by the generator. Discriminators are convolutional neural networks.

Work process of GANs

GANs help pair a generator with a discriminator. The discriminator and the generator operate so that the discriminator cannot distinguish the images generated by the generator. In other words, the generator tries to fool the discriminator, and the discriminator tries to keep from being fooled.

Initially, we train the discriminator with a known dataset. We present the discriminator with samples from the training dataset until it achieves acceptable accuracy. Then, the generator trains to fool the discriminator. Typically, the generator is fed with randomized input sampled from a predefined latent space.

Consider latent space as the random noise. Latent space is useless, but the generator can use samples from this space and build new images. After that, the output generated by the generator is evaluated by the discriminator.

GANs rely on back-propagation on both networks to minimize the errors. This helps the generator produces better images while the discriminator becomes more skilled at flagging synthetic images.

We represent the generative network as $G(z;\theta_g)$ where $z\sim p(z)$ and $p(z)$ is the prior defined on the noise distribution of the generative network $p_g$ over data $x$ . On the other hand, the discriminator is represented as $D(x;\theta_d)$ where $\theta_d$ is the parameters for the multilayer perceptron network. The objective function that combines both $G$ and $D$ is given below:

This objective function will minimize the error of the generative network to generate more and more accurate images. It maximizes the discriminator so it can learn to distinguish between the images generated by the generator and the images from the training dataset.

An amazing extension to GANs is Conditional GANs. It is a simple yet powerful extension. We apply a condition $c$ to the inputs of both $G$ and $D$ . Many state-of-the-art models use this technique to achieve their application goals.

Applications

There are a number of ways and applications that utilize GANs.

Data generation: Several studies use GANs to fulfill data needs.
Cartoon generation: GANs are also used to generate new anime characters.
Face aging: GANs are used to create images that can give a view of one's old age.
Day-to-Night: The images clicked in the daylight can be converted to portray the same scene at night.

State-of-the-art GANs

In this section, we'll discuss some well-known state-of-the-art GAN models.

StyleGAN: StyleGAN generates a highly detailed image of $1024 \times 1024$ dimensions. It transforms the image into levels. Building upon the images from the previous level, styleGAN adds more and more details to the image.
Pix2Pix: Pix2Pix performs image-to-image translation. An input image is translated into a detailed image that is comparable with the images in the training dataset.

Free Resources