Trusted answers to developer questions

Longest palindromic subsequence algorithm

Get Started With Data Science

Learn the fundamentals of Data Science with this free course. Future-proof your career by adding Data Science skills to your toolkit — or prepare to land a job in AI, Machine Learning, or Data Analysis.

The longest palindromic subsequence is the longest sequence of characters in a string that spells the same forwards and backward.

A subsequence differs from a substring since characters in a subsequence are not required to be at consecutive positions in the original string.

Example

Take string "ABBCDABBC", for example. Then the longest palindromic subsequence in this string is "BBABB".

A naive approach would be to find all possible palindromic subsequences in "ABBCDABBC", and filter out the longest one.

Note: However, this approach has exponential time complexity.

A much better solution is to use Dynamic Programming. It is an optimization technique where complex problems are broken down into subproblems, with each subproblem solved only once.


Dynamic Programming solution

We can solve this problem using the solution to another similar problem - the Longest Common Subsequence (LCS) problem. The idea is:

  • Reverse the given sequence. Let’s call the original sequence original. Let’s call the reversed sequence reverse.
  • Use the LCS algorithm to find the longest common subsequence between original and reverse. Let LCS(original, reverse) be a function that returns the longest common subsequence between the pair of strings.
  • The answer from LCS will in fact be the longest palindromic subsequence.

More specifically, consider the illustration below:

svg viewer

The last characters match for both original and reverse in the figure above, hence we recur on both of them excluding the last character.

Next, consider the case illustrated below:

svg viewer

In the case where the last characters do not match, we have to solve two subproblems:

  1. Ignore the last character of reverse, and recur on original and updated reverse again.

  2. Ignore the last character of original and recur on the updated original and reverse again.

Select the best result from these two subproblems.


Mathematical form

Now, to combine all the above cases into a mathematical equation:

Lets call the original sequence X=(x1,x2,xm)X=(x_1, x_2, \cdots x_m) and reverse as Y=(y1,y2,yn)Y=(y_1, y_2, \cdots y_n). The prefixes of XX are X1,2,,mX_{1,2,\cdots,m} and the prefixes of YY are Y1,2,,nY_{1,2,\cdots,n}.

Let LCS(Xi,Yj)\mathit{LCS}(X_i,Y_j) represent the set of longest common subsequence of prefixes XiX_i and YjY_j. Then:

LCS(Xi,Yj)={ifi=0orj=0,LCS(Xi1,Yj1)^xiifi,j>0&xi=yjmax{LCS(Xi,Yj1),LCS(Xi1,Yj)}ifi,j>0&xiyj\small \mathit{LCS}(X_{i},Y_{j}) = \begin{cases} \emptyset & if \:i=0 \:or j=0, \\ \mathit{LCS}(X_{i-1},Y_{j-1}) \hat {} x_{i} & if \: i,j>0 \: \& \: x_{i} = y_{j}\\ \max \big\{ \mathit{LCS}(X_{i},Y_{j-1}), \mathit{LCS}(X_{i-1},Y_{j})\big\} & if \: i,j>0 \: \& \: x_{i} \ne y_{j} \end{cases}

If the last characters match, then the sequence LCS(Xi1,Yj1)\mathit{LCS}(X_{i-1},Y_{j-1}) is extended by that matching character xix_i. Otherwise, the best result from LCS(Xi,Yj1)\mathit{LCS}(X_{i},Y_{j-1}) and LCS(Xi1,Yj)\mathit{LCS}(X_{i-1},Y_{j}) is used.


Optimal substructure

The solution to this problem satisfies the overlapping subproblem property, as shown in the diagram below. The diagram is a partial LCS of "PXYM" and "PYQX":

%0 node_1 LCS("PXYM","PYQX") node_2 LCS("PXY","PYQX") node_1->node_2 node_3 LCS("PXYM","PYQ") node_1->node_3 node_1560339834426 LCS("PX","PYQX") node_2->node_1560339834426 node_1560339850843 LCS("PXY","PYQ") node_2->node_1560339850843 node_1560339976225 LCS("PXY","PYQ") node_3->node_1560339976225 node_1560339952831 LCS("PXYM","PY") node_3->node_1560339952831

The boxes in purple above are the same subproblem - this means we do not need to solve it twice. Using memoization, overlapping subproblems are solved only once and their result is stored to be used if the subproblem shows up again.

For this problem, a two dimensional array LCS can be made as a memo, where LCS[i][j] represents the length of the longest common subsequence between original with length i and reverse, with length j. After populating the memo using the above algorithm, the longest palindromic subsequence can be reconstructed using backtracking on the memo.

RELATED TAGS

lcs
dynamic
programming
Copyright ©2024 Educative, Inc. All rights reserved
Did you find this helpful?