# Text Segmentation

Explore the text segmentation problem to take it to the next level.

We'll cover the following

## Optimizing the text segmentation problem

For our next dynamic programming algorithm, let’s consider the text segmentation problem from the previous chapter. We are given a string $A[1 .. n]$ and a subroutine $IsWord$ that determines whether a given string is a word (whatever that means), and we want to know whether $A$ can be partitioned into a sequence of words.

We solved this problem by defining a function $Splittable(i)$ that returns True if and only if the $suffix\space A[i .. n]$ can be partitioned into a sequence of words. We need to compute $Splittable(1)$. This function satisfies the recurrence

$Splittable(i)=\begin{cases} & \text{ True \hspace{5.8cm}} if\space i > n \\ & \overset{n}{\underset{j=1}{\vee}} \left ( IsWord(i,j) \wedge Splittable(j+1) \right ) \hspace{1cm} otherwise \end{cases}$

where $IsWord(i, j)$ is shorthand for $IsWord(A[i .. j])$. This recurrence translates directly into a recursive backtracking algorithm that calls the $IsWord$ subroutine $O(2^n)$ times in the worst case.

But for any fixed string $A[1..n]$, there are only $n$ different ways to call the recursive function $Splittable(i)$—one for each value of $i$ between 1 and $n + 1$—and only $O(n^2)$ different ways to call $IsWord(i, j)$—one for each pair $(i, j)$ such that $1 ≤ i ≤ j ≤ n$.

## Memoization

Why are we spending exponential time computing only a polynomial amount of stuff?

Each recursive subproblem is specified by an integer between $1$ and $n + 1$, so we can memoize the function $Splittable$ into an array $SplitTable[1 .. n + 1]$. Each subproblem $Splittable(i)$ depends only on the results of subproblems $Splittable(j)$ where $j > i$, so the memoized recursive algorithm fills the array in decreasing index order. If we fill the array in this order deliberately, we obtain the dynamic programming algorithm shown in the pseudocode below. The algorithm makes $O(n^2)$ calls to $IsWord$, an exponential improvement over our earlier backtracking algorithm.