Home/Blog/Machine Learning/Build a Deep Learning Text Generator Project with Markov Chains

Build a Deep Learning Text Generator Project with Markov Chains

10 min read

Nov 03, 2020

content

Introduction to the Text Generator Project

What are Markov Chains?

Text Generation Project Implementation

1. Generate the lookup table

2. Convert frequencies to probabilities

3. Load the dataset

4. Build the Markov chains

5. Sample the text

6. Generate text

What to learn next

Continue reading about NLP and Machine Learning

Natural language processing (NLP) and deep learning are growing in popularity for their use in ML technologies like self-driving cars and speech recognition software.

As more companies begin to implement deep learning components and other machine learning practices, the demand for software developers and data scientists with proficiency in deep learning is skyrocketing.

Today, we will introduce you to a popular deep learning project, the Text Generator, to familiarize you with important, industry-standard NLP concepts, including Markov chains.

By the end of this article, you’ll understand how to build a Text Generator component for search engine systems and know how to implement Markov chains for faster predictive models.

Introduction to the Text Generator Project#

Text generation is popular across the board and in every industry, especially for mobile, app, and data science. Even journalism uses text generation to aid writing processes.

You’ve probably encountered text generation technology in your day-to-day life. iMessage text completion, Google search, and Google’s Smart Compose on Gmail are just a few examples. These skills are valuable for any aspiring data scientist.

Today, we are going to build a text generator using Markov chains. This will be a character based model that takes the previous character of the chain and generates the next letter in the sequence.

By training our program with sample words, our text generator will learn common patterns in character order. The text generator will then apply these patterns to the input, an incomplete word, and output the character with the highest probability to complete that word.

Let’s suppose we have a string, monke. We need to find the character that is best suited after the character e in the word monke based on our training corpus.

Our text generator would determine that y is sometimes after e and would form a completed word. In other words, we are going to generate the next character for that given string.

The text generator project relies on text generation, a subdivision of natural language processing that predicts and generates next characters based on previously observed patterns in language.

Without NLP, we’d have to create a table of all words in the English language and match the passed string to an existing word. There are two problems with this approach.

It would be very slow to search thousands of words

The generator could only complete words that it had seen before.

NLP allows us to dramatically cut runtime and increase versatility because the generator can complete words it hasn’t even encountered before. NLP can be expanded to predict words, phrases, or sentences if needed!

For this project, we will specifically be using Markov chains to complete our text. Markov processes are the basis for many NLP projects involving written language and simulating samples from complex distributions.

Markov processes are so powerful that they can be used to generate superficially real-looking text with only a sample document.

What are Markov Chains?#

A Markov chain is a stochastic process that models a sequence of events in which the probability of each event depends on the state of the previous event. The model requires a finite set of states with fixed conditional probabilities of moving from one state to another

The probability of each shift depends only on the previous state of the model, not the entire history of events.

For example, imagine you wanted to build a Markov chain model to predict weather conditions.

We have two states in this model, sunny or rainy. There is a higher probability (70%) that it’ll be sunny tomorrow if we’ve been in the sunny state today. The same is true for rainy, if it has been rainy it will most likely continue to rain.

However, it’s possible (30%) that the weather will shift states, so we also include that in our Markov chain model.

The Markov chain is a perfect model for our text generator because our model will predict the next character using only the previous character. The advantage of using a Markov chain is that it’s accurate, light on memory (only stores 1 previous state), and fast to execute.

Text Generation Project Implementation#

We’ll complete our text generator project in 6 steps:

Generate the lookup table: Create table to record word frequency

Convert frequency to probability: Convert our findings to a usable form

Load the dataset: Load and utilize a training set

Build the Markov chains: Use probabilities create chains for each word and character

Sample our data: Create a function to sample individual sections of the corpus

Generate text: Test our model

1. Generate the lookup table#

First, we’ll create a table that records the occurrences of each character state within our training corpus. We will save the last ‘K’ characters and the ‘K+1’ character from the training corpus and save them in a lookup table.

For example, imagine our training corpus contained, “the man was, they, then, the, the”. Then the number of occurrences by word would be:

“the” - 3

“then” - 1

“they” - 1

“man” - 1

Here’s what that would look like in a lookup table:

X Y Frequency

the " " 3

the “n” 2

the “y” 1

the “i” 1

man " " 1

In the example above, we have taken K = 3. Therefore, we’ll consider 3 characters at a time and take the next character (K+1) as our output character.

In the above lookup table, we have the word (X) as the and the output character (Y) as a single space (" "). We have also calculated how many times this sequence occurs in our dataset, 3 in this case.

We’ll find this data for each word in the corpus to generate all possible pairs of X and Y within the dataset.

Here’s how we’d generate a lookup table in code:

X	Y	Frequency
the	" "	3
the	“n”	2
the	“y”	1
the	“i”	1
man	" "	1

2. Convert frequencies to probabilities#

Once we have this table and the occurances, we’ll generate the probability that an occurance of Y will appear after an occurance of a given X. Our equation for this will be:

$\frac {Frequency of Y with X}{Sum of Total Frequencies}$

For example, if X = the and Y = n our equation would look like this:

Frequency that Y = n when X = the: 2

Total frequency in the table: 8

Therefore: $P = {2}/{8}$ $= 0.125$ $= 12.5$ %

Here’s how we’d apply this equation to convert our lookup table to probabilities usable with Markov chains:

Explanation

On line 1, we created a method to generate the Markov model. This method accepts the text corpus and the value of K, which is the value telling the Markov model to consider K characters and predict the next character.

On line 2, we generated our lookup table by providing the text corpus and K to our method, generateTable(), which we created in the previous lesson.

On line 3, we converted the frequencies into the probabilistic values by using the method, convertFreqIntoProb(), which we also created in the previous lesson.

Explanation

The function, sample_next(ctx,model,k), accepts three parameters: the context, the model, and the value of K.

The ctx is nothing but the text that will be used to generate some new text. However, only the last K characters from the context will be used by the model to predict the next character in the sequence.

For example, we passed the value of context as commo and value of K = 4, so the context, which the model will look to generate the next character, is of K characters long and hence, it will be ommo because the Markov models only take the previous history. You can see the value of the context variable by printing it too.

On line 9 and 10, we printed the possible characters and their probability values, which are also present in our model. We got the next predicted character as n, and its probability is 1.0. It makes sense because the word commo is more likely to be common after generating the next character.

On line 12, we returned a sampled character according to the probabilistic values as we discussed above.

What to learn next#

Congratulations on completing this text generation project. You now have hands-on experience with Natural Language Processing and Markov chain models to use as you continue your deep learning journey.

Your next steps are to adapt the project to produce more understandable output, learn a tool like GPT-3, or to try some more awesome machine learning projects like:

Pokemon classification system

Emoji predictor using NLP

Text decryption using recurrent neural network

To walk you through these projects and more, Educative has created Building Advanced Deep Learning and NLP Projects. This course gives you the chance to practice advanced deep learning concepts as you complete interesting and unique projects like the one we did today. By the end, you’ll have the experience to use any of the top deep learning algorithms on your own projects.

Happy learning!

Continue reading about NLP and Machine Learning#

Data Science Simplified: top 5 NLP tasks that use Hugging Face

Data Science Simplified: What is language modeling for NLP?

Crack the top 40 machine learning interview questions

Written By:
Ryan Thelin

Free AI Mock Interviews

Coding Interview
Coding PatternsFree Interview
Gain insights and practical experience with coding patterns through targeted MCQs and coding problems, designed to match and challenge your expertise level.
System Design
YouTubeFree Interview
Learn to design a video streaming platform like YouTube by tackling functional and non-functional requirements, core components, and high-level to detailed design challenges.

Free Resources

Build a Deep Learning Text Generator Project with Markov Chains

Introduction to the Text Generator Project#

What are Markov Chains?#

Text Generation Project Implementation#

1. Generate the lookup table#

2. Convert frequencies to probabilities#

3. Load the dataset#

4. Build the Markov chains#

5. Sample the text#

6. Generate text#

What to learn next#

Continue reading about NLP and Machine Learning#