Putting All the Encoder Components Together
Explore the functioning of transformer encoders by understanding how input embeddings with positional encoding are processed through multiple layers. Learn to follow the flow through multi-head attention and feedforward network sublayers, see how stacked encoders build rich sentence representations, and understand how these outputs feed into the decoder for generating target sentences.
We'll cover the following...
The following figure shows the stack of two encoders; only encoder 1 is expanded to reduce the clutter:
Working of the encoder
From the preceding figure, we can understand the following:
First, we convert our input to an input embedding (embedding matrix), and then add the position encoding to it and feed it as input to the bottom-most encoder (encoder 1).
Encoder 1 takes the input and sends it to the multi-head attention sublayer, which returns the attention matrix as output.
We take the attention matrix and feed it as input to the next ...