Search⌘ K

ELMo: Taking Ambiguities Out of Word Vectors

Explore how ELMo creates contextualized word vectors that capture different meanings of the same word based on context. Understand its architecture using character-level CNNs and bidirectional LSTM layers, and learn how to integrate a pretrained ELMo model from TensorFlow Hub for improved NLP tasks.

Limitations of vanilla word embeddings

So far, we’ve looked at word embedding algorithms that can give only a unique representation of the words in the vocabulary. However, they will give a constant representation for a given word, no matter how many times we query. Why would this be a problem? Consider the following two phrases:

I went to the bank to deposit some money

and

I walked along the river bank

Clearly, the word “bank” is used in two totally different contexts. If we use a vanilla word vector algorithm (e.g., skip-gram), we can only have one representation for the word “bank,” and it’s probably going to be muddled between the concept of a financial institution and the concept of walkable edges along a river, depending on the references to this word found in the corpus it’s trained on. Therefore, it’s more sensible to provide embeddings for a word while preserving and leveraging the context around it. This is exactly what ELMo is striving for.

ELMo:

...