The Skip-Gram Algorithm

Explore the skip-gram algorithm to learn how it generates word embeddings by leveraging surrounding word contexts in text data. This lesson guides you through preparing training data, the algorithm's mechanics, and implementing it in TensorFlow, helping you build foundational skills for NLP tasks.

We'll cover the following...

From raw text to semistructured text
Understanding the skip-gram algorithm

The first algorithm we’ll talk about is known as the skip-gram algorithm, which is a type of Word2vec algorithm. As we have discussed in numerous places, the meaning of a word can be elicited from the contextual words surrounding it. However, it isn’t entirely straightforward to develop a model that exploits this way of learning word meanings. The skip-gram algorithm, introduced by Mikolov et al. in 2013, is an algorithm that exploits the context of the words in a written text to learn good word embeddings.

Let’s go through the skip-gram algorithm step by step. First, we’ll discuss the data preparation process. Understanding the format of the data puts us in a great position to understand the algorithm. We’ll then discuss the algorithm itself. Finally, we’ll implement the algorithm using TensorFlow.

From raw text to semistructured text

First, we need to design a mechanism to extract a dataset that can be fed to our learning model. Such a dataset should be a set of tuples of the format (target, context). Moreover, this needs to be created in an unsupervised manner. That is, a human should not have to manually engineer the labels for the data. In summary, the data preparation process should do the following:

Capture the surrounding words of a given word (that is, the context).
Run in an unsupervised manner.

The skip-gram model uses the following approach to design a dataset:

For a given word $w_i$ , a context window size of $m$ is assumed. By “context window size,” we mean the number of words considered as context on a single side. Therefore, for $w_i$ , the context window (including the target word $w_i$ ) will be of size $2m+1$ and will look like this: $[w_{i-m}, ..., w_{i-1}, w_i, w_{i+1}, ..., w_{i+m}].$
Next, (target, context) tuples are formed as $[..., (w_i, w_{i-m}), ..., (w_i, w_{i-1}), (w_i, w_{i+1}), ..., (w_i, w_{i+m}),...];$ ...

1.Introduction to Natural Language Processing

2.Understanding TensorFlow 2

3.Word2vec: Learning Word Embeddings

4. Advanced Word Vector Algorithms

5.Sentence Classification with Convolutional Neural Networks

6.Recurrent Neural Networks

7.Understanding Long Short-Term Memory Networks

8.Applications of LSTM: Generating Text

9.Sequence-to-Sequence Learning: Neural Machine Translation

10.Transformers

Project

11.Image Captioning with Transformers

12.Final Remarks

13.Appendix: Mathematical Foundations and Advanced TensorFlow

Mock Interview

The Skip-Gram Algorithm

From raw text to semistructured text