Search⌘ K
AI Features

Project Creation: Part Two

Explore how to build a text generator using Markov chains by applying sampling methods on a real corpus. Understand how to convert character frequencies into probabilities and create a model that generates dynamic text based on the dataset of speeches.

Load the dataset

Now’s the time to work with our real corpus. Click the download button below to get the dataset. This dataset contains the speech of the Honorable Prime Minister of India in English.

train_corpus.txt
Python 3.5
text_path = "train_corpus.txt"
def load_text(filename):
with open(filename,encoding='utf8') as f:
return f.read().lower()
text = load_text(text_path)
print('Loaded the dataset.')

Understand sampling

Before moving forward, one more important concept needs to be addressed: sampling. In simple words, sampling is the action or process of taking samples of something for analysis. Let’s understand sampling with the help of an example. Run the code ...