Search⌘ K

Encoder-Decoder

Explore the encoder-decoder model architecture, focusing on BiLSTM encoders and LSTM decoders for sequence-to-sequence tasks. Understand how the decoder initializes from the encoder's final states and learn differences between training and inference processes to implement effective NLP models.

Chapter Goals:

  • Learn about the encoder-decoder model architecture

A. Model architecture

As previously mentioned, the encoder portion of the encoder-decoder model in this section is a BiLSTM. The decoder portion is just a regular forward LSTM, with the same number of LSTM layers as the encoder.

What makes an encoder-decoder model so powerful is that the decoder uses the final state of the encoder as its initial state. This gives the decoder access to the information that the encoder extracted from the input sequence, which is crucial for good sequence to sequence modeling.

In the case of multiple layers for the encoder-decoder model, each layer's final state in ...