Calculating Loss
Explore how to convert LSTM outputs into logits and apply sparse softmax cross entropy loss in language models. Understand using a padding mask to exclude padded time steps, ensuring accurate loss calculation. This lesson guides you through implementing loss calculation for sequence data in NLP models using TensorFlow.
We'll cover the following...
Chapter Goals:
Convert your LSTM model's outputs into logits
Use a padding mask to calculate the overall loss
A. Logits & loss
As mentioned in earlier chapters, the task for a language model is no different from regular multiclass classification. Therefore, the loss function will still be the regular softmax cross entropy loss. We use a final fully-connected layer to convert model outputs into logits for each of the possible classes (i.e. vocabulary words).
The function used to calculate the softmax cross entropy loss for feed-forward neural networks is tf.nn.softmax_cross_entropy_with_logits. However, we can only use this function if the ...