Putting All the Decoder Components Together
Explore the workings of transformer decoder components by learning how input embeddings, masked multi-head attention, encoder-decoder attention, and feedforward networks operate together. Understand how stacking decoders forms the target sentence representation in NLP architectures.
We'll cover the following...
We'll cover the following...
The following figure shows the stack of two decoders; only decoder 1 is expanded to reduce the clutter:
How the decoder works
From the preceding figure, we can understand the following:
We convert the input to the decoder into an embedding matrix and then add the position encoding to it and feed it as input to the bottom-most decoder (decoder 1).
The decoder takes the input and sends it to the masked multi-head attention layer, which returns the attention matrix,
...