Generator and Discriminator

Learn how adversarial attacks led to the birth of Generative Adversarial Networks.

Adversarial attacks

Deep learning models are highly vulnerable to attacks that are based on small modifications of the input to the model at test time.

Suppose you have a trained classifier that correctly recognizes an object in an image with the correct label.

It is possible to construct an adversarial example, which is a visually indistinguishable image.

These adversarial images can be constructed by noise perturbation. However, the image is classified incorrectly. To address this problem, a common approach is to inject adversarial examples into the training set. This is known as adversarial training. Doing this increases the neural network’s robustness. This type of example can be generated by adding noise, by applying data augmentation techniques, or by perturbating the image in the opposite direction of the gradient (to maximize loss instead of minimizing it).

An example

This code takes the input image and the computed gradients after a forward and backward pass and perturbates the image so that the model cannot recognize it with the same confidence.

import torch
## FGSM attack code
def fgsm_attack(image, epsilon, data_grad):
    # Collect the element-wise sign of the data gradient
    sign_data_grad = data_grad.sign()
    # Create the perturbed image by adjusting each pixel of the input image
    perturbed_image = image + epsilon*sign_data_grad
    # Adding clipping to maintain [0,1] range
    perturbed_image = torch.clamp(perturbed_image, 0, 1)
    # Return the perturbed image
    return perturbed_image

These perturbations look something like this:

Get hands-on with 1200+ tech skills courses.