Sequence Alignment is the Manhattan Tourist Problem in Disguise
Explore how comparing biological sequences can be framed as finding paths in an alignment graph. Understand the connection between sequence alignment, longest common subsequence, and dynamic programming algorithms to solve these problems efficiently.
We'll cover the following...
Sequence comparison as a graph
In the figure below, we add two arrays of integers to an alignment of ATGTTATA and
ATCGTCC. The array [0 1 2 2 3 4 5 6 7 8] holds the number of symbols of ATGTTATA
used up to a given column in the alignment. Similarly, the array [0 1 2 3 4 5 5 6 6 7]
holds the number of symbols of ATCGTCC used up to a given column. In figure below, we add a third array [↘ ↘ → ↘ ↘ ↓ ↘ ↓ ↘] recording whether each column represents a match/mismatch (↘/↘), an insertion (→), or a deletion (↓).
This third array corresponds to a path from source to sink in an rectangular grid, shown in the figure below (left). The i-th node of this path is made up of the i-th element of [0 1 2 2 3 4 5 6 7 8] and the i-th element of [0 1 2 3 4 5 5 6 6 7]: ...