Natural Language Processing: Word Embeddings

Introduction

How is Google Translate able to understand text and convert it from one language to another? How do you make a computer understand that in the context of IT Apple is a company and not a fruit? Or, how is a Smart-Keypad able to predict the next few words that you are most likely going to type? These are examples of tasks that deal with text processing and fall under the umbrella of natural language processing (NLP).

We have an intuitive capability to deal with text, but with millions and millions of documents being generated every day, computers need this intuitive understanding of text data as well. They need to be able to understand the nuances implicitly encoded into our languages. Humans cannot be sitting behind Smart Reply or Google Translate. This is where word embeddings come into play.


What are word embeddings?

“girl-woman” vs “girl-apple”: which of the pairs has words more similar to each other?

For us, it’s automatic to understand the associations between words in a language. We know that “girl” and “woman” have more similar meanings than “girl” and “apple,” but what if we want computers to understand these nuances implicitly into our languages as well? This is where word embeddings come into play.

Word embeddings transform human language meaningfully into a numerical form. The main idea here is that every word can be converted to a set of numbers, called N-dimensional vectors. Although every word gets assigned a unique vector, embedding, similar words end up having values closer to each other. For example, the vectors for the words “woman” and “girl” would have a higher similarity than the vectors for "girl and “apple”. When represented in vector space, their vectors would be at a shorter distance from each other.

For these numerical representations to be really useful, the goal is to capture meanings, semantic relationships, which are similarities between words, and the context of different words as they are used naturally by humans.

The meaning of a word can be captured, to some extent, by its use with other words. For example, “food” and “hungry” are more likely to be used in the same context than the words “hungry” and “software”.

The idea is that given any two words if these two words have a similar meaning, they are likely to have similar context words. This is used as the basis of the training algorithms for word embeddings.

“You shall know a word by the company it keeps” – Firth, J.R. (1957)


Simple frequency-based embeddings: One-hot encoding

Let’s start by looking at the simplest way of converting text into a vector.

Get hands-on with 1200+ tech skills courses.