What is Autograd?

Overview

Autograd is an automatic differentiation package in the PyTorch library that helps train a neural network through graph computing. Instead of executing instructions immediately (also known as eager execution), Autograd builds a graph and uses it to speed up the calculation of derivatives, which are needed for training a neural network.

How it works

When training neural networks, weights and biases must be adjusted during backpropagation. This is done by finding the gradient of every output for every input. For an input vector x of n dimensions and an output vector of m dimensions, the matrix of these gradients would be:

This matrix, denoted by J, is called a Jacobian. Mathematically, Autograd calculates the Jacobianvector products.

For example, let's suppose we have another vector, v, and v is the gradient of the following scalar function:

$l$ = g(y): $v=\left(\begin{array}{ccc}\frac{\partial l}{\partial y_{1}} & \cdots & \frac{\partial l}{\partial y_{m}}\end{array}\right)^{T}$

Then, using the chain rule, the Jacobian-vector product gives a gradient l for x, as follows:

Recall that the chain rule is as follows:

$Chain rule$ Chain rule

In neural networks, the partial derivatives of the model’s outputs for its inputs would need multiple local partial derivatives for each multiplied learning weight, every activation function, and so on. This is computationally very expensive.

Autograd assists in solving this problem. It tracks all the operations performed on a tensor (vector) and stores them in the tensor itself. The tensor has a graph of its own that is made with the operations performed on it. This speeds up the process of derivative calculations.

Example

In the following code, we import the torch package for Autograd, and the matplotlib package for plotting graphs:

tensor([ 0.0000e+00,  5.1764e-01,  1.0000e+00,  1.4142e+00,  1.7321e+00,
         1.9319e+00,  2.0000e+00,  1.9319e+00,  1.7321e+00,  1.4142e+00,
         1.0000e+00,  5.1764e-01, -1.7485e-07, -5.1764e-01, -1.0000e+00,
        -1.4142e+00, -1.7321e+00, -1.9319e+00, -2.0000e+00, -1.9319e+00,
        -1.7321e+00, -1.4142e+00, -1.0000e+00, -5.1764e-01,  3.4969e-07],
       grad_fn=<MulBackward0>)
tensor([ 1.0000e+00,  1.5176e+00,  2.0000e+00,  2.4142e+00,  2.7321e+00,
         2.9319e+00,  3.0000e+00,  2.9319e+00,  2.7321e+00,  2.4142e+00,
         2.0000e+00,  1.5176e+00,  1.0000e+00,  4.8236e-01, -3.5763e-07,
        -4.1421e-01, -7.3205e-01, -9.3185e-01, -1.0000e+00, -9.3185e-01,
        -7.3205e-01, -4.1421e-01,  4.7684e-07,  4.8236e-01,  1.0000e+00],
       grad_fn=<AddBackward0>)
tensor(25.0000, grad_fn=<SumBackward0>)

The graph gives us the value obtained by differentiating 2*sin(StartingVector) + 1, which is the operation performed on the input, from StartingVector.

Note: The gradients (.grad ) are only stored in leaf Nodes, that is, the input vectors. In this case, the input vector was StartingVector. Therefore, c.grad, d.grad, and so on will give None.

Enable or disable Autograd

When declaring Tensors for models using torch, requires_grad is assumed to be set toTrue. There are two ways of disabling this:

Directly set the flag to False
Use torch.no_grad

In such cases where Autograd is not enabled, the torch.enable_grad() method is used.

Pros and cons of using Autograd

Autograd runs code in graph execution mode as opposed to eager execution.

This has the following advantages:

Since eager execution runs all operations one by one, it cannot take advantage of potential acceleration resources. Graph execution extracts tensor computations from Python and builds an efficient graph before evaluation.
It allows for better parallel computing, since Autograd allocates resources more efficiently to run multiple operations in parallel. This also results in better utilization of GPUs or TPUs.

There are also some disadvantages to using Autograd:

Autograd is unsuitable for smaller applications since it takes initial computing power to construct a graph.
Depending on the implementation, the program can also become more complex.