Search⌘ K

Charging Station: Conversions between Patterns and Numbers

Explore recursive methods to convert DNA k-mers into numerical representations and reverse conversions. Understand the algorithmic steps that map DNA sequences to numbers using lexicographic ordering, enabling analysis and comparison in genomics.

We'll cover the following...

Our approach to computing PatternToNumber(Pattern) is based on a simple observation. If we remove the final symbol from all lexicographically ordered k-mers, the resulting list is still ordered lexicographically (think about removing the final letter from every word in a dictionary). In the case of DNA strings, every (k − 1)-mer in the resulting list is repeated four times.

Thus, the number of 3-mers occurring before AGT is equal to four times the number of 2-mers occurring before AG plus the number of 1-mers occurring before T. Therefore,

PatternToNumber( AGT ) = 4 · PatternToNumber( AG ) + SymbolToNumber( T ) = 8 + 3 = 11,

where SymbolToNumber(symbol) is the function transforming symbols A, ...