Embedding Words
Explore how to embed words into vectors using Keras' Tokenizer and embedding tables, then build a text classification neural network with LSTM and sigmoid layers. Understand preprocessing, tokenization, and dataset preparation to train a sentiment analysis model effectively.
We're ready to transform words into word vectors. Embedding words into vectors happens via an embedding table. An embedding table is basically a lookup table. Each row holds the word vector of a word. We index the rows by word-IDs, hence the flow of obtaining a word's word vector is as follows:
word->word-ID: Previously, we obtained a word-ID for each word with Keras'
Tokenizer.Tokenizerholds all the vocabulary and maps each vocabulary word to an ID, which is an integer.word-ID->word vector: A word-ID is an integer and therefore can be used as an index to the embedding table's rows. Each word-ID corresponds to one row, and when we want to get a word's word vector, we first obtain its word-ID and then do a lookup in the embedding table rows with this word-ID.
The following diagram shows how embedding words into word vectors works:
Remember that in the previous lesson, we started with a list of sentences. Then we did the following:
We broke each sentence into words and built a vocabulary with Keras'
Tokenizer.The
Tokenizerobject held a word index, which was a word->word-ID mapping.After obtaining the word-ID, we could do a lookup to the embedding table rows with this word-ID and got a word vector.
Finally, we fed ...