Search⌘ K
AI Features

Embedding Words

Explore how to embed words into vectors using Keras' Tokenizer and embedding tables, then build a text classification neural network with LSTM and sigmoid layers. Understand preprocessing, tokenization, and dataset preparation to train a sentiment analysis model effectively.

We're ready to transform words into word vectors. Embedding words into vectors happens via an embedding table. An embedding table is basically a lookup table. Each row holds the word vector of a word. We index the rows by word-IDs, hence the flow of obtaining a word's word vector is as follows:

  1. word->word-ID: Previously, we obtained a word-ID for each word with Keras' Tokenizer. Tokenizer holds all the vocabulary and maps each vocabulary word to an ID, which is an integer.

  2. word-ID->word vector: A word-ID is an integer and therefore can be used as an index to the embedding table's rows. Each word-ID corresponds to one row, and when we want to get a word's word vector, we first obtain its word-ID and then do a lookup in the embedding table rows with this word-ID.

The following diagram shows how embedding words into word vectors works:

Steps of transforming a word to its word vector with Keras
Steps of transforming a word to its word vector with Keras

Remember that in the previous lesson, we started with a list of sentences. Then we did the following:

  1. We broke each sentence into words and built a vocabulary with Keras' Tokenizer.

  2. The Tokenizer object held a word index, which was a word->word-ID mapping.

  3. After obtaining the word-ID, we could do a lookup to the embedding table rows with this word-ID and got a word vector.

  4. Finally, we fed ...