Search⌘ K

Generating Document Embeddings

Explore the process of generating document embeddings by averaging word vectors from skip-gram, CBOW, GloVe, and contextual ELMo embeddings. Understand how to preprocess text and compute meaningful document representations using these advanced word vector algorithms.

Let’s first remind ourselves how we stored embeddings for the skip-gram, CBOW, and GloVe algorithms. The figure below depicts how these look in a pd.DataFrame object.

A snapshot of the context embeddings of the skip-gram algorithm we saved to the disk
A snapshot of the context embeddings of the skip-gram algorithm we saved to the disk

We can see that the bottom left corner in the image above says that it has 128 columns (i.e., the embedding size).

ELMo embeddings

ELMo embeddings are an exception to this. Since ELMo generates contextualized representations ...