Search⌘ K

Putting the Encoder and Decoder Together

Explore how transformer architectures integrate encoder and decoder modules to process input and generate output sequences. Understand the training process involving cross-entropy loss minimization, Adam optimization, and dropout for regularization. This lesson helps you grasp the mechanics behind sequence-to-sequence learning in NLP.

We'll cover the following...

We feed the decoder representation of the target sentence to the linear and softmax layers and get the predicted word. To give more clarity, the complete transformer architecture with the encoder and decoder is shown in the following figure:

Encoder and decoder of the transformer
Encoder and decoder of the transformer

In the preceding figure, Nx denotes that we can stack ...