Introduction of Word2vec: Learning Word Embeddings

Get an overview of word embeddings.

We'll cover the following

Overview

In this chapter, we’ll discuss a topic of paramount importance in NLP—Word2vec, a data-driven technique for learning powerful numerical representations (that is, vectors) of words or tokens in a language. Languages are complex. This warrants sound language understanding capabilities in the models we build to solve NLP problems. When transforming words to a numerical representation, a lot of methods aren’t able to sufficiently capture the semantics and contextual information that a word carries. For example, the feature representation of the word “forest” should be very different from “oven” because these words are rarely used in similar contexts, whereas the representations of “forest” and “jungle” should be very similar. Not being able to capture this information leads to underperforming models.

Word2vec tries to overcome this problem by learning word representations by consuming large amounts of text.

Note: Word2vec is called a distributed representation because the semantics of the word are captured by the activation pattern of the full representation vector, in contrast to a single element of the representation vector (for example, setting a single element in the vector to 1 and rest to 0 for a single word).

Flow of the topics

In this chapter, we’ll learn the mechanics of several Word2vec algorithms. But first, we’ll discuss the classical approaches to solving this problem and their limitations. This then motivates us to look at learning neural network-based Word2vec algorithms that deliver state-of-the-art performance when finding good word representations.

We’ll train a model on a dataset and analyze the representations learned by the model. We visualize (using t-SNE, a visualization technique for high-dimensional data) these learned word embeddings for a set of words on a 2D canvas in the figure below. If we take a closer look, we’ll see that similar things are placed close to each other (for example, numbers in the cluster in the middle):

Get hands-on with 1200+ tech skills courses.