ELMo: Taking Ambiguities Out of Word Vectors
Explore how ELMo creates contextualized word vectors that capture different meanings of the same word based on context. Understand its architecture using character-level CNNs and bidirectional LSTM layers, and learn how to integrate a pretrained ELMo model from TensorFlow Hub for improved NLP tasks.
We'll cover the following...
Limitations of vanilla word embeddings
So far, we’ve looked at word embedding algorithms that can give only a unique representation of the words in the vocabulary. However, they will give a constant representation for a given word, no matter how many times we query. Why would this be a problem? Consider the following two phrases:
I went to the bank to deposit some money
and
I walked along the river bank
Clearly, the word “bank” is used in two totally different contexts. If we use a vanilla word vector algorithm (e.g., skip-gram), we can only have one representation for the word “bank,” and it’s probably going to be muddled between the concept of a financial institution and the concept of walkable edges along a river, depending on the references to this word found in the corpus it’s trained on. Therefore, it’s more sensible to provide embeddings for a word while preserving and leveraging the context around it. This is exactly what ELMo is striving for.