Attention: Analyzing, Computing, and Implementing

Learn to analyze, compute, and implement the attention mechanism.

Analyzing the encoder states

Instead of relying just on the encoder’s last state, attention enables the decoder to analyze the complete history of state outputs. The decoder does this at every step of the prediction and creates a weighted average of all the state outputs depending on what it needs to produce at that step. For example, in the translation “I went to the shop” → “ich ging zum Laden,” when predicting the word “ging,” the decoder will pay more attention to the first part of the English sentence than the latter.

There have been many different implementations of attention over the years. It’s important to properly emphasize the need for attention in NMT systems. The context, or thought vector, that resides between the encoder and the decoder is a performance bottleneck:

Get hands-on with 1200+ tech skills courses.