Consider a longest common subsequence C=(c1,…,cp) specified by indices 1≤i1<i2<…<ip≤n and 1≤j1<j2<…<jp≤m (therefore, for every 1≤q≤p,aiq=bjq=cq).
Last symbols of A and B appear in C. In this case, ip=n and jp=m. Then, (c1,…,cp−1) is a longest common subsequence of (a1,…,an−1) and (b1,…,bm−1).
At least one of the last symbols of A and B doesn’t appear in C. In this case, either ip<n or jp<m. Then, (c1,…,cp−1) lies entirely in either (a1,…,an−1) or (b1,…,bm−1).
This way, we reduce the problem for the initial strings A and B to the same problem on their prefixes. This motivates the following definition: LCS(i,j) is the length of the longest common subsequence of A[1…i] and B[1…j]. The discussion above implies that this function satisfies the following recurrence relation:
The base case for this recurrence relation is i=0 or j=0:
LCS(0,j)=LCS(i,0)=0.
The resulting algorithm is shown below. Its running time is O(nm).
LCS(A[1…n],B[1…m]): table← 2d array of size (n+1)×(m+1) table[i][0]←0 and table[0][j]←0 for all i,j
for all i,j for i from 1 to n:
for j from 1 to m: table[i][j]←table[i−1][j] table[i][j]←max(table[i][j],table[i][j−1])
if A[i]=B[j]: table[i][j]←max(table[i][j],table[i−1][j−1]+1)
return table[n][m]
Stop and think: Can you reduce the Longest Common Subsequence Problem to the Edit Distance Problem?
Level up your interview prep. Join Educative to access 70+ hands-on prep courses.