Natural Language Processing with TensorFlow/

...

Backpropagation through Time

Learn about backpropagation and how it works through time.

We'll cover the following...

How backpropagation works
Why we can’t use BP directly for RNNs
Backpropagation through time: Training RNNs
TBPTT: Efficiently training RNNs
Limitations of BPTT: Vanishing and exploding gradients

For training RNNs, a special form of backpropagation known as backpropagation through time (BPTT) is used. To understand BPTT, however, first, we need to understand how backpropagation (BP) works. Then, we’ll discuss why BP can’t be directly applied to RNNs but how BP can be adapted for RNNs, resulting in BPTT. Finally, we’ll discuss two major problems present in BPTT.

How backpropagation works

Backpropagation is the technique that’s used to train a feed-forward neural network. In backpropagation, we do the following:

Calculate a prediction for a given input.
Calculate an error, $E$ , of the prediction by comparing it to the actual label of the input (for example, mean squared error and cross-entropy loss).
Update the weights of the feed-forward network to minimize the loss calculated in step 2 by taking a small step in the opposite direction of the gradient $\frac{\partial E}{\partial w_{ij}}$ for all $w_{ij}$ , where $w_{ij}$ is the $j^{th}$ weight of the $i^{th}$ layer.

To understand the computations above more clearly, consider the feed-forward network depicted in the figure below. This has two single weights, $w_1$ and $w_2$ , and calculates two outputs, $h$ and $y$ , as shown in the following figure. We assume no nonlinearities in the model for simplicity:

Press + to interact

Introduction to Natural Language Processing

Understanding TensorFlow 2

Word2vec: Learning Word Embeddings

Advanced Word Vector Algorithms

Sentence Classification with Convolutional Neural Networks

Recurrent Neural Networks

Understanding Long Short-Term Memory Networks

Applications of LSTM: Generating Text

Sequence-to-Sequence Learning: Neural Machine Translation

Transformers

Sarcasm Classification Using BERT

Image Captioning with Transformers

Caption Generation Using PyTorch

Final Remarks

Appendix: Mathematical Foundations and Advanced TensorFlow

Backpropagation through Time

How backpropagation works