Overview: Word Vectors and Semantic Similarity

Let's see what we will learn in this section.

Word vectors are handy tools and have been the hot topic of NLP for almost a decade. A word vector is basically a dense representation of a word. What's surprising about these vectors is that semantically similar words have similar word vectors. Word vectors are great for semantic similarity applications, such as calculating the similarity between words, phrases, sentences, and documents. At a word level, word vectors provide information about synonymity, semantic analogies, and more. We can build semantic similarity applications by using word vectors.

Word vectors are produced by algorithms that make use of the fact that similar words appear in similar contexts. To capture the meaning of a word, a word vector algorithm collects information about the surrounding words that the target word appears with. This paradigm of capturing semantics for words by their surrounding words is called distributional semantics.

We will introduce the distributional semantics paradigm and its associated semantic similarity methods. We will start by taking a conceptual look at text vectorization so that you know what NLP problems word vectors solve.

Next, we will become familiar with word vector computations such as distance calculation, analogy calculation, and visualization. Then, we will learn how to benefit from spaCy's pre-trained word vectors, as well as import and use third-party vectors. Finally, we will go through advanced semantic similarity methods using spaCy.

Get hands-on with 1200+ tech skills courses.