Embedding Loss
Learn how to calculate embedding loss using noise-contrastive estimation and sampled softmax loss functions. Understand how these methods simplify training word embeddings by approximating full softmax and optimizing model performance using TensorFlow.
We'll cover the following...
Chapter Goals:
- Learn about the different types of candidate sampling algorithms and loss functions
- Calculate the embedding model's loss with the NCE loss function
A. Loss functions
As mentioned in the previous chapter, candidate sampling avoids performing a costly full softmax operation to calculate the embedding loss. Instead, there are two main loss functions we use: sampled softmax and NCE loss.
Sampled Softmax
As the name suggests, this is just a softmax loss with "sampled" classes. The classes we use to calculate the softmax include the actual context vocabulary word (the true label), as well as a randomly chosen set of words from the entire vocabulary to act as the false labels. In TensorFlow, we can compute the sampled softmax loss using the ...