Writing a Markov Text Generator
Learn about Markov chains, text generation, and how to write a Markov text generator.
Markov chains and text generation
When it comes to automated text generation, one of the more approachable techniques is the Markov chain. A Markov chain is a mathematical concept that helps us model random systems where the sequence of events within the system is probabilistic. Specifically, the interesting aspect of Markov chains is that for any given event within the system, the probability of the next event within a series of events only depends on the state of the current event.
That is a lot to take in all at once. We worked through some helper methods to split a piece of input text into a list of words and retrieve the frequency or number of times a term appears within a piece of text. Let’s look at the text in the code snippet below:
The eldest cat, the youngest cat, and their group of cat friends went
down the alley to fetch the runaway ball.
If we count the number of words (not uniquely) within the text, we can see there are 21 words. Additionally, if we use our previously developed wordFrequency
helper method to get our word frequency ordered by how often they appear:
<?php$text = <<<TEXTThe eldest cat, the youngest cat, and their group of cat friends wentdown the alley to fetch the runaway ball.TEXT;str($text)->wordFrequency->sortByDesc(fn($x) => $x);
We would arrive at the results in the table below:
Example Word Frequency
Word | Occurrences | Percentage |
the | 4 | 19.05% |
cat | 3 | 14.29% |
eldest | 1 | 4.76% |
youngest | 1 | 4.76% |
and | 1 | 4.76% |
their | 1 | 4.76% |
group | 1 | 4.76% |
of | 1 | 4.76% |
friends | 1 | 4.76% |
went | 1 | 4.76% |
down | 1 | 4.76% |
alley | 1 | 4.76% |
to | 1 | 4.76% |
fetch | 1 | 4.76% |
runaway | 1 | 4.76% |
ball | 1 | 4.76% |
The results in the table above are not too surprising. The word “the” appears the most often, followed by “cat,” and then things quickly taper off, where all the remaining words appear once within the original text. This alone is a curious piece of information, but let’s look at the frequency of ...
Get hands-on with 1400+ tech skills courses.