Text Segmentation

Optimizing the text segmentation problem

For our next dynamic programming algorithm, let’s consider the text segmentation problem from the previous chapter. We’re given a string A[1..n]A[1 .. n] and a subroutine IsWordIsWord that determines whether a given string is a word, and we want to know whether AA can be partitioned into a sequence of words.

We solved this problem by defining a function Splittable(i)Splittable(i) that returns TrueTrue if, and only if, the suffix A[i..n]suffix\space A[i .. n] can be partitioned into a sequence of words. We need to compute Splittable(1)Splittable(1). This function satisfies the recurrence

Splittable(i)={ True if i>nj=1n(IsWord(i,j)Splittable(j+1))otherwiseSplittable(i)=\begin{cases} & \text{ True \hspace{5.8cm}} if\space i > n \\ & \overset{n}{\underset{j=1}{\vee}} \left ( IsWord(i,j) \wedge Splittable(j+1) \right ) \hspace{1cm} otherwise \end{cases}

where IsWord(i,j)IsWord(i, j) is shorthand for IsWord(A[i..j])IsWord(A[i .. j]). This recurrence translates directly into a recursive backtracking algorithm that calls the IsWordIsWord subroutine O(2n)O(2^n) times in the worst case.

However, for any fixed string A[1..n]A[1..n], there are only nn different ways to call the recursive function Splittable(i)Splittable(i)—one for each value of ii between 11 and n+1n + 1—and only O(n2)O(n^2) different ways to call IsWord(i,j)IsWord(i, j)—one for each pair (i,j)(i, j) such that 1ijn1 ≤ i ≤ j ≤ n.

Memoization

Why are we spending exponential time computing only a polynomial amount of stuff?

Each recursive subproblem is specified by an integer between 11 and n+1n + 1, so we can memoize the function SplittableSplittable into an array SplitTable[1..n+1]SplitTable[1 .. n + 1]. Each subproblem Splittable(i)Splittable(i) depends only on the results of subproblems Splittable(j)Splittable(j) where j>ij > i, so the memoized recursive algorithm fills the array in decreasing index order. If we fill the array in this order deliberately, we obtain the dynamic programming algorithm shown below. The algorithm makes O(n2)O(n^2) calls to IsWordIsWord, an exponential improvement over our earlier backtracking algorithm.

Create a free account to access the full course.

By signing up, you agree to Educative's Terms of Service and Privacy Policy