Trusted answers to developer questions

Ibrahim Khan

In this shot, we’ll take a look at how we can convert patterns to numbers, and vice-versa.

The implementation we will be looking at is based on the observation that a lexicographically ordered list remains lexicographically ordered if we remove the last symbol of all the elements in that list. Applying this observation to DNA strings, we notice that every `(k-1)-mer`

is repeated 4 times.

If we have a lexicographically ordered DNA of `3-mers`

, removing the last symbol will leave a lexicographic order of `2-mers`

, where they are repeated 4 times.

Let’s take a look at an example. We have symbols `A`

, `C`

, `G`

, and `T`

that correspond to the numbers `0`

, `1`

, `2`

, and `3`

, respectively. If we convert `ACT`

to numbers, we would get `7`

. Let’s take a look at the code for this function.

## Change the symbol to the corresponding number def symbol_to_number(symbol): if (symbol == 'A'): return 0 elif (symbol == 'C'): return 1 elif (symbol == 'G'): return 2 elif (symbol == 'T'): return 3 def pattern_to_number(pattern): ## return if pattern is empty if not pattern: return 0 ## Convert last symbol to number last_symbol = symbol_to_number(pattern[len(pattern) - 1]) ## recursively convert the remaining pattern to number prefix_value = pattern_to_number(pattern[:-1]) ## return the number corresponding to the pattern return (4*prefix_value + last_symbol) print(pattern_to_number("ACT"))

- In line 2, we define the
`symbol_to_number`

function, which converts the symbol to the corresponding number as defined in the requirements. - In line 12, we define the
`pattern_to_number`

function, which takes`pattern`

as input. - In lines 14-15, we define the base case. If
`pattern`

is empty, the function will return`0`

. - In line 18, we take out the last symbol from
`pattern`

and convert it to the corresponding number using the`symbol_to_number`

function. - In line 20, we add the recursive call to the function, in which we pass
`pattern`

without the final character as an argument. - In line 22, we return the result of the function.

To compute the pattern from a number, we need to inverse the function written above. To do this, we will reverse engineer the process of the “Pattern to number” section. Here are the steps:

- Divide the index by 4 since we multiplied by 4 above.
- Convert the remainder number to the corresponding symbol.
- Repeat the above steps on the quotient until the base case gets met (the value of
`k`

becomes equal to`1`

).

Let’s take the same example from above to get a better understanding. Here, the numbers `0`

, `1`

, `2`

, and `3`

correspond to `A`

, `C`

, `G`

, and `T`

, respectively. If we input the `index`

as `7`

and `k`

(the `k`

in `k-mer`

) as `3`

we will get the pattern `ACT`

.

## Convert numebr to it's corresponding symbols def number_to_symbol(number): if (number == 0): return 'A' elif (number == 1): return 'C' elif (number == 2): return 'G' elif (number == 3): return 'T' def number_to_pattern(index, k): ## return error if conversion is ## not possible for the provided length if index >= 4**k: return "Index value in too large for given k" ## base case when length(k) is 1 if k == 1: return number_to_symbol(index) ## Get prefix index prefix_index = int(index/4) ## Get the index of the last character remainder = index % 4 ## Get last symbol last_symbol = number_to_symbol(remainder) ## recursively get the pattern without the last symbol prefix_pattern = number_to_pattern(prefix_index, k-1) return prefix_pattern + last_symbol print(number_to_pattern(7, 3))

- In line 2, we define the
`number_to_symbol`

function, which converts the numbers to the corresponding symbol as defined in the requirements. - In line 13, we define the
`number_to_pattern`

function, which takes`index`

and`k`

as inputs. - In lines 16-17, we insert a constraint such that
`index`

cannot exceed the largest possible value of`k`

. - In lines 20-21, we have the base case. If the value of
`k`

is`1`

, it will return the number corresponding to the symbol it was provided with. - In line 24, we calculate the
`prefix_index`

, which is the quotient when we divide`index`

by 4. - In line 26, we get the remainder from the division of
`index`

with 4. - In line 28, we get the last symbol by converting the remainder to its corresponding character.
- In line 30, we add the recursive call for the
`prefix_index`

, with length`k-1`

. - In line 32, we concatenate the
`prefix_pattern`

and the`last_symbol`

and return the complete pattern.

RELATED TAGS

dna

CONTRIBUTOR

Ibrahim Khan

Copyright ©2022 Educative, Inc. All rights reserved

RELATED COURSES

View all Courses

Keep Exploring

Related Courses