Longest palindromic subsequence algorithm

The longest palindromic subsequence is the longest sequence of characters in a string that spells the same forwards and backward.

A subsequence differs from a substring since characters in a subsequence are not required to be at consecutive positions in the original string.

Example

Take string "ABBCDABBC", for example. Then the longest palindromic subsequence in this string is "BBABB".

A naive approach would be to find all possible palindromic subsequences in "ABBCDABBC", and filter out the longest one.

Note: However, this approach has exponential time complexity.

A much better solution is to use Dynamic Programming. It is an optimization technique where complex problems are broken down into subproblems, with each subproblem solved only once.

Dynamic Programming solution

We can solve this problem using the solution to another similar problem - the Longest Common Subsequence (LCS) problem. The idea is:

Reverse the given sequence. Let’s call the original sequence original. Let’s call the reversed sequence reverse.
Use the LCS algorithm to find the longest common subsequence between original and reverse. Let LCS(original, reverse) be a function that returns the longest common subsequence between the pair of strings.
The answer from LCS will in fact be the longest palindromic subsequence.

More specifically, consider the illustration below:

In the case where the last characters do not match, we have to solve two subproblems:

Ignore the last character of reverse, and recur on original and updated reverse again.
Ignore the last character of original and recur on the updated original and reverse again.

Select the best result from these two subproblems.

Mathematical form

Now, to combine all the above cases into a mathematical equation:

Lets call the original sequence $X=(x_1, x_2, \cdots x_m)$ and reverse as $Y=(y_1, y_2, \cdots y_n)$ . The prefixes of $X$ are $X_{1,2,\cdots,m}$ and the prefixes of $Y$ are $Y_{1,2,\cdots,n}$ .

Let $\mathit{LCS}(X_i,Y_j)$ represent the set of longest common subsequence of prefixes $X_i$ and $Y_j$ . Then:

\small \mathit{LCS}(X_{i},Y_{j}) = \begin{cases} \emptyset & if \:i=0 \:or j=0, \\ \mathit{LCS}(X_{i-1},Y_{j-1}) \hat {} x_{i} & if \: i,j>0 \: \& \: x_{i} = y_{j}\\ \max \big\{ \mathit{LCS}(X_{i},Y_{j-1}), \mathit{LCS}(X_{i-1},Y_{j})\big\} & if \: i,j>0 \: \& \: x_{i} \ne y_{j} \end{cases}

If the last characters match, then the sequence $\mathit{LCS}(X_{i-1},Y_{j-1})$ is extended by that matching character $x_i$ . Otherwise, the best result from $\mathit{LCS}(X_{i},Y_{j-1})$ and $\mathit{LCS}(X_{i-1},Y_{j})$ is used.

Optimal substructure

The solution to this problem satisfies the overlapping subproblem property, as shown in the diagram below. The diagram is a partial LCS of "PXYM" and "PYQX":

The boxes in purple above are the same subproblem - this means we do not need to solve it twice. Using memoization, overlapping subproblems are solved only once and their result is stored to be used if the subproblem shows up again.

For this problem, a two dimensional array LCS can be made as a memo, where LCS[i][j] represents the length of the longest common subsequence between original with length i and reverse, with length j. After populating the memo using the above algorithm, the longest palindromic subsequence can be reconstructed using backtracking on the memo.