Search⌘ K

Calculating Loss

Understand how to calculate the training loss in Seq2Seq encoder-decoder models by using model logits and final token sequences. Explore applying sequence masks to zero out padding and compute sparse softmax cross-entropy loss efficiently. This lesson guides you through implementing the calculate_loss function, essential for optimizing NLP models in machine learning.

Chapter Goals:

  • Calculate the training loss based on the model's logits and final token sequences

A. Final token sequence

So far, we've used the input sequences and ground truth sequences for training the encoder-decoder model. The final token sequences are used when calculating the loss.

If we view the decoder as a language model, the ground truth sequences act as the language model's input while the final token sequences act as the "correct" output for the language model.

In a language model, we calculate the loss based on the model's logits and ...