Improving LSTMs: Generating Text with Words Instead of N-grams

Learn how using words instead of n-grams can improve LSTMs.

Here, we’ll discuss ways to improve LSTMs. We have so far used bigrams as our basic unit of text. But we would get better results by incorporating words instead of bigrams. This is because using words reduces the overhead of the model by alleviating the need to learn to form words from bigrams. We’ll discuss how we can employ word vectors in the code to generate better-quality text compared to using bigrams.

The curse of dimensionality

One major limitation stopping us from using words instead of n-grams as the input to our LSTM is that this will drastically increase the number of parameters in our model. Let’s try to understand this through an example. Consider that we have an input of size 500 and a cell state of size 100. This would result in a total of approximately 240,000 parameters (excluding the softmax layer), as shown here:

Get hands-on with 1200+ tech skills courses.