In this shot, we’ll take a look at how we can convert patterns to numbers, and vice-versa.
The implementation we will be looking at is based on the observation that a lexicographically ordered list remains lexicographically ordered if we remove the last symbol of all the elements in that list. Applying this observation to DNA strings, we notice that every (k-1)-mer
is repeated 4 times.
If we have a lexicographically ordered DNA of 3-mers
, removing the last symbol will leave a lexicographic order of 2-mers
, where they are repeated 4 times.
Let’s take a look at an example. We have symbols A
, C
, G
, and T
that correspond to the numbers 0
, 1
, 2
, and 3
, respectively. If we convert ACT
to numbers, we would get 7
. Let’s take a look at the code for this function.
## Change the symbol to the corresponding number def symbol_to_number(symbol): if (symbol == 'A'): return 0 elif (symbol == 'C'): return 1 elif (symbol == 'G'): return 2 elif (symbol == 'T'): return 3 def pattern_to_number(pattern): ## return if pattern is empty if not pattern: return 0 ## Convert last symbol to number last_symbol = symbol_to_number(pattern[len(pattern) - 1]) ## recursively convert the remaining pattern to number prefix_value = pattern_to_number(pattern[:-1]) ## return the number corresponding to the pattern return (4*prefix_value + last_symbol) print(pattern_to_number("ACT"))
symbol_to_number
function, which converts the symbol to the corresponding number as defined in the requirements.pattern_to_number
function, which takes pattern
as input.pattern
is empty, the function will return 0
.pattern
and convert it to the corresponding number using the symbol_to_number
function.pattern
without the final character as an argument.To compute the pattern from a number, we need to inverse the function written above. To do this, we will reverse engineer the process of the “Pattern to number” section. Here are the steps:
k
becomes equal to 1
).Let’s take the same example from above to get a better understanding. Here, the numbers 0
, 1
, 2
, and 3
correspond to A
, C
, G
, and T
, respectively. If we input the index
as 7
and k
(the k
in k-mer
) as 3
we will get the pattern ACT
.
## Convert numebr to it's corresponding symbols def number_to_symbol(number): if (number == 0): return 'A' elif (number == 1): return 'C' elif (number == 2): return 'G' elif (number == 3): return 'T' def number_to_pattern(index, k): ## return error if conversion is ## not possible for the provided length if index >= 4**k: return "Index value in too large for given k" ## base case when length(k) is 1 if k == 1: return number_to_symbol(index) ## Get prefix index prefix_index = int(index/4) ## Get the index of the last character remainder = index % 4 ## Get last symbol last_symbol = number_to_symbol(remainder) ## recursively get the pattern without the last symbol prefix_pattern = number_to_pattern(prefix_index, k-1) return prefix_pattern + last_symbol print(number_to_pattern(7, 3))
number_to_symbol
function, which converts the numbers to the corresponding symbol as defined in the requirements.number_to_pattern
function, which takes index
and k
as inputs.index
cannot exceed the largest possible value of k
.k
is 1
, it will return the number corresponding to the symbol it was provided with.prefix_index
, which is the quotient when we divide index
by 4.index
with 4.prefix_index
, with length k-1
.prefix_pattern
and the last_symbol
and return the complete pattern.RELATED TAGS
CONTRIBUTOR
View all Courses