What is reverse mode differentiation?

In deep learning, we use automatic differentiation to compute the derivatives of a function or a neural network. There are two types of automatic differentiation:

Forward mode differentiation
Reverse mode differentiation

In this Answer, we will discuss the latter.

Reverse Mode Differentiation

In reverse mode differentiation, we compute the derivative of the computational graph by starting from the final output towards the input variables with the help of the chain rule.

Let us understand the concept of reverse mode differentiation with the help of an example.

Example

Consider a function $f(x,y)=xy^2+3xy$ where $x$ and $y$ are independent variables.

The trace graph of this function is given below:

In reverse mode differentiation, we find the derivative of the parent node with respect to the children nodes. Now look at the below-mentioned points, where we will find the derivatives at every node.

We start from $D$ and compute the derivative.

$\frac{\partial f} {\partial D} =1$ $C$

Now, we will find the derivative at $C$ .

$\frac{\partial f} {\partial C} =\frac{\partial f} {\partial D}.\frac{\partial D} {\partial C}$

$\frac{\partial f} {\partial C} =1.\frac{\partial (B+C) } {\partial C}$

$\frac{\partial f} {\partial C} =(1)(1)$

$\frac{\partial f} {\partial C} =1$

Similarly, we find the derivative of $B$ .

$\frac{\partial f} {\partial B} =\frac{\partial f} {\partial D}.\frac{\partial D} {\partial B}$

$\frac{\partial f} {\partial B} =1$

Finding the derivative at the node $V_1$ is tedious, so by using the chain rule, we have:

$\frac{\partial f} {\partial A} =\frac{\partial f} {\partial C}.\frac{\partial C} {\partial A}+\frac{\partial f} {\partial B}\frac{\partial B} {\partial A}$

$\frac{\partial f} {\partial A} =\frac{\partial f} {\partial C}.\frac{\partial 3A} {\partial A}+\frac{\partial f} {\partial B}\frac{\partial X_2A} {\partial A}$

$\frac{\partial f} {\partial A} =1(3)+X_2$

$\frac{\partial f} {\partial A} =X_2+3$

Now we will find the derivatives at input nodes.

$\frac{\partial f} {\partial X_1} =\frac{\partial f} {\partial A}.\frac{\partial A} {\partial X_1}$

$\frac{\partial f} {\partial X_1} =(X_2+3).\frac{\partial X_1X_2} {\partial X_1}$

$\frac{\partial f} {\partial X_1} = X_2(X_2+3)$

$\frac{\partial f} {\partial X_1} = y(y+3)$

Similarly, for node $X_2$ .

$\frac{\partial f} {\partial X_2} =\frac{\partial f} {\partial A}.\frac{\partial A} {\partial X_2}+\frac{\partial f} {\partial B}\frac{\partial B} {\partial X_2}$

$\frac{\partial f} {\partial X_2} =x(y+3)+1.\frac{\partial A.X_2} {\partial X_2}$

$\frac{\partial f} {\partial X_2} =x(y+3)+A$

$\frac{\partial f} {\partial X_2} =2xy+3x$

So, we have calculated the derivatives of all nodes using reverse mode differentiation.

Conclusion

In this Answer, we learned about reverse mode differentiation with the help of an example with its trace graph. We calculated the derivative from the final node towards the input nodes. In essence, reverse mode differentiation is used in machine learning for the backpropagation process and enables efficient computations to find the gradients.