Embedding Words

Explore how to embed words into vectors using Keras' Tokenizer and embedding tables, then build a text classification neural network with LSTM and sigmoid layers. Understand preprocessing, tokenization, and dataset preparation to train a sentiment analysis model effectively.

We'll cover the following...

Neural network architecture for text classification
Dataset
Data and vocabulary preparation
The input layer
The embedding layer
The LSTM layer
The output layer
Compiling the model
Fitting the model and experiment evaluation

We're ready to transform words into word vectors. Embedding words into vectors happens via an embedding table. An embedding table is basically a lookup table. Each row holds the word vector of a word. We index the rows by word-IDs, hence the flow of obtaining a word's word vector is as follows:

word->word-ID: Previously, we obtained a word-ID for each word with Keras' Tokenizer. Tokenizer holds all the vocabulary and maps each vocabulary word to an ID, which is an integer.
word-ID->word vector: A word-ID is an integer and therefore can be used as an index to the embedding table's rows. Each word-ID corresponds to one row, and when we want to get a word's word vector, we first obtain its word-ID and then do a lookup in the embedding table rows with this word-ID.

The following diagram shows how embedding words into word vectors works:

1.Getting Started

2.Core Operations with spaCy

3.Linguistic Features

4.Rule-Based Matchmaking

5.Working with Word Vectors and Semantic Similarity

6.Putting Everything Together: Semantic Parsing with spaCy

Assessment

Project

7.Customizing spaCy Models

8.Text Classification with spaCy

9.spaCy and Transformers

10.Putting Everything Together: Designing a Chatbot with spaCy

11.Appendix

12.Conclusion

Assessment

Embedding Words