Multi-Head Attention
Learn about the inner workings of the multi-head attention component of the decoder.
The following figure shows the transformer model with both the encoder and decoder. As we can observe, the multi-head attention sublayer in each decoder receives two inputs: one is from the previous sublayer, masked multi-head attention, and the other is the encoder representation:
Get hands-on with 1200+ tech skills courses.