Search⌘ K

String Reconstruction With Overlap Graph: From String To Graph

Understand how to assemble genomes by transforming collections of k-mers into directed overlap graphs that reveal genome structure. Learn to connect DNA fragments based on shared prefixes and suffixes, and explore the process of reconstructing strings from genome paths. This lesson equips beginners with the foundational skills to apply graph-based algorithms to genome assembly challenges.

From a string to a graph

Repeats in a genome necessitate some way of looking ahead to see the correct assembly in advance. Returning to our previous example, you may have already found that TAATGCCATGGGATGTT is a solution to the String Reconstruction Problem for the collection of fifteen 3-mers in the last section, as illustrated below.

Note that we use a different color for each interval of the string between occurrences of ATG.

STOP and Think: Is this the only solution to the String Reconstruction Problem for this collection of 3-mers?

In the figure above, consecutive 3-mers in TAATGCCATGGGATGTT are linked together to form the genome path.

String spelled by a genome path

String Spelled by a Genome Path Problem

Problem overview:

Reconstruct a string from its genome path.

Input: A sequence of k-mers Pattern1_{1}, . . . , Patternn_{n} ...