Search⌘ K
AI Features

Putting All the Decoder Components Together

Explore the workings of transformer decoder components by learning how input embeddings, masked multi-head attention, encoder-decoder attention, and feedforward networks operate together. Understand how stacking decoders forms the target sentence representation in NLP architectures.

The following figure shows the stack of two decoders; only decoder 1 is expanded to reduce the clutter:

A stack of two decoders with decoder 1 expanded
A stack of two decoders with decoder 1 expanded

How the decoder works

From the preceding figure, we can understand the following:

  1. We convert the input to the decoder into an embedding matrix and then add the position encoding to it and feed it as input to the bottom-most decoder (decoder 1).

  2. The decoder takes the input and sends it to the masked multi-head attention layer, which returns the attention matrix, ...