Optimizing the text segmentation problem

For our next dynamic programming algorithm, let’s consider the text segmentation problem from the previous chapter. We are given a string A[1..n]A[1 .. n] and a subroutine IsWordIsWord that determines whether a given string is a word (whatever that means), and we want to know whether AA can be partitioned into a sequence of words.

We solved this problem by defining a function Splittable(i)Splittable(i) that returns True if and only if the suffix A[i..n]suffix\space A[i .. n] can be partitioned into a sequence of words. We need to compute Splittable(1)Splittable(1). This function satisfies the recurrence

Splittable(i)={ True if i>nj=1n(IsWord(i,j)Splittable(j+1))otherwiseSplittable(i)=\begin{cases} & \text{ True \hspace{5.8cm}} if\space i > n \\ & \overset{n}{\underset{j=1}{\vee}} \left ( IsWord(i,j) \wedge Splittable(j+1) \right ) \hspace{1cm} otherwise \end{cases}

where IsWord(i,j)IsWord(i, j) is shorthand for IsWord(A[i..j])IsWord(A[i .. j]). This recurrence translates directly into a recursive backtracking algorithm that calls the IsWordIsWord subroutine O(2n)O(2^n) times in the worst case.

But for any fixed string A[1..n]A[1..n], there are only nn different ways to call the recursive function Splittable(i)Splittable(i)—one for each value of ii between 1 and n+1n + 1—and only O(n2)O(n^2) different ways to call IsWord(i,j)IsWord(i, j)—one for each pair (i,j)(i, j) such that 1ijn1 ≤ i ≤ j ≤ n.

Memoization

Why are we spending exponential time computing only a polynomial amount of stuff?

Each recursive subproblem is specified by an integer between 11 and n+1n + 1, so we can memoize the function SplittableSplittable into an array SplitTable[1..n+1]SplitTable[1 .. n + 1]. Each subproblem Splittable(i)Splittable(i) depends only on the results of subproblems Splittable(j)Splittable(j) where j>ij > i, so the memoized recursive algorithm fills the array in decreasing index order. If we fill the array in this order deliberately, we obtain the dynamic programming algorithm shown in the pseudocode below. The algorithm makes O(n2)O(n^2) calls to IsWordIsWord, an exponential improvement over our earlier backtracking algorithm.

Create a free account to access the full course.

By signing up, you agree to Educative's Terms of Service and Privacy Policy