Understanding the Decoder of the Transformer
Explore the transformer decoder's function in sequence generation, including how it uses encoder outputs and previous tokens with masked multi-head attention and positional encoding to produce accurate translations. This lesson clarifies decoder inputs, stepwise word prediction, and the internal architecture critical for NLP applications.
Suppose we want to translate the English sentence (source sentence) 'I am good' to the French sentence (target sentence) 'Je vais bien'. To perform this translation, we feed the source sentence 'I am good' to the encoder. The encoder learns the representation of the source sentence. We've learned how exactly the encoder learns the representation of the source sentence. Now, we take this encoder's representation and feed it to the decoder. The decoder takes the encoder representation as input and generates the target sentence 'Je vais bien', as shown in the following figure:
We learned earlier that instead of having one encoder, we can have a stack of
How the decoder generates the target sentence
Okay, but how exactly does the decoder generate the target sentence? Let's explore that in more detail. At time step