Search⌘ K

Putting All the Encoder Components Together

Explore the functioning of transformer encoders by understanding how input embeddings with positional encoding are processed through multiple layers. Learn to follow the flow through multi-head attention and feedforward network sublayers, see how stacked encoders build rich sentence representations, and understand how these outputs feed into the decoder for generating target sentences.

The following figure shows the stack of two encoders; only encoder 1 is expanded to reduce the clutter:

A stack of encoders with encoder 1 expanded
A stack of encoders with encoder 1 expanded

Working of the encoder

From the preceding figure, we can understand the following:

  1. First, we convert our input to an input embedding (embedding matrix), and then add the position encoding to it and feed it as input to the bottom-most encoder (encoder 1).

  2. Encoder 1 takes the input and sends it to the multi-head attention sublayer, which returns the attention matrix as output.

  3. We take the attention matrix and feed it as input to the next ...