You shall know a word by the company it keeps.

– J.R. Firth

This statement, uttered by J. R. Firth in 1957, lies at the very foundation of Word2vec because Word2vec techniques use the context of a given word to learn its semantics.

An approach to learning word representation

Word2vec is a groundbreaking approach that allows computers to learn the meaning of words without any human intervention. Also, Word2vec learns numerical representations of words by looking at the words surrounding a given word.

We can test the correctness of the preceding quote by imagining a real-world scenario. Imagine we’re sitting an exam and find this sentence in our first question: “Mary is a very stubborn child. Her pervicacious nature always gets her in trouble.” Now, unless we are very clever, we might not know what “pervicacious” means. In such a situation, we automatically will be compelled to look at the phrases surrounding the word of interest. In our example, “pervicacious” is surrounded by “stubbornness,” “nature,” and “trouble.” Looking at these three words is enough to determine that pervicacious, in fact, means the state of being stubborn. This is adequate evidence to observe the importance of context for a word’s meaning.

Now let’s discuss the basics of Word2vec. As already mentioned, Word2vec learns the meaning of a given word by looking at its context and representing it numerically.

By context, we refer to a fixed number of words in front of and behind the word of interest. Let’s take a hypothetical corpus with NN words. Mathematically, this can be represented by a sequence of words denoted by w0,w1,...,wi,w_0, w_1, ..., w_i, and wN,w_N, where wiw_i is the ithi^{th} word in the corpus.

Next, if we want to find a good algorithm that’s capable of learning word meanings, given a word, our algorithm should be able to predict the context words correctly.

This means that the following probability should be high for any given word wiw_i:

Get hands-on with 1200+ tech skills courses.