In this shot, we’ll take a look at how we can convert patterns to numbers, and vice-versa.
The implementation we will be looking at is based on the observation that a lexicographically ordered list remains lexicographically ordered if we remove the last symbol of all the elements in that list. Applying this observation to DNA strings, we notice that every
(k-1)-mer is repeated 4 times.
If we have a lexicographically ordered DNA of
3-mers, removing the last symbol will leave a lexicographic order of
2-mers, where they are repeated 4 times.
Let’s take a look at an example. We have symbols
T that correspond to the numbers
3, respectively. If we convert
ACT to numbers, we would get
7. Let’s take a look at the code for this function.
## Change the symbol to the corresponding number def symbol_to_number(symbol): if (symbol == 'A'): return 0 elif (symbol == 'C'): return 1 elif (symbol == 'G'): return 2 elif (symbol == 'T'): return 3 def pattern_to_number(pattern): ## return if pattern is empty if not pattern: return 0 ## Convert last symbol to number last_symbol = symbol_to_number(pattern[len(pattern) - 1]) ## recursively convert the remaining pattern to number prefix_value = pattern_to_number(pattern[:-1]) ## return the number corresponding to the pattern return (4*prefix_value + last_symbol) print(pattern_to_number("ACT"))
symbol_to_numberfunction, which converts the symbol to the corresponding number as defined in the requirements.
pattern_to_numberfunction, which takes
patternis empty, the function will return
patternand convert it to the corresponding number using the
patternwithout the final character as an argument.
To compute the pattern from a number, we need to inverse the function written above. To do this, we will reverse engineer the process of the “Pattern to number” section. Here are the steps:
kbecomes equal to
Let’s take the same example from above to get a better understanding. Here, the numbers
3 correspond to
T, respectively. If we input the
3 we will get the pattern
## Convert numebr to it's corresponding symbols def number_to_symbol(number): if (number == 0): return 'A' elif (number == 1): return 'C' elif (number == 2): return 'G' elif (number == 3): return 'T' def number_to_pattern(index, k): ## return error if conversion is ## not possible for the provided length if index >= 4**k: return "Index value in too large for given k" ## base case when length(k) is 1 if k == 1: return number_to_symbol(index) ## Get prefix index prefix_index = int(index/4) ## Get the index of the last character remainder = index % 4 ## Get last symbol last_symbol = number_to_symbol(remainder) ## recursively get the pattern without the last symbol prefix_pattern = number_to_pattern(prefix_index, k-1) return prefix_pattern + last_symbol print(number_to_pattern(7, 3))
number_to_symbolfunction, which converts the numbers to the corresponding symbol as defined in the requirements.
number_to_patternfunction, which takes
indexcannot exceed the largest possible value of
1, it will return the number corresponding to the symbol it was provided with.
prefix_index, which is the quotient when we divide
prefix_index, with length
last_symboland return the complete pattern.
View all Courses