Trusted answers to developer questions
Trusted Answers to Developer Questions

Related Tags

dna

How to convert DNA patterns to numbers and vice-versa

Ibrahim Khan

In this shot, we’ll take a look at how we can convert patterns to numbers, and vice-versa.

Pattern to number

The implementation we will be looking at is based on the observation that a lexicographically ordered list remains lexicographically ordered if we remove the last symbol of all the elements in that list. Applying this observation to DNA strings, we notice that every (k-1)-mer is repeated 4 times.

If we have a lexicographically ordered DNA of 3-mers, removing the last symbol will leave a lexicographic order of 2-mers, where they are repeated 4 times.

Let’s take a look at an example. We have symbols A, C, G, and T that correspond to the numbers 0, 1, 2, and 3, respectively. If we convert ACT to numbers, we would get 7. Let’s take a look at the code for this function.

Code

## Change the symbol to the corresponding number
def symbol_to_number(symbol):
  if (symbol == 'A'):
    return 0
  elif (symbol == 'C'):
    return 1
  elif (symbol == 'G'):
    return 2
  elif (symbol == 'T'):
    return 3

def pattern_to_number(pattern):
  ## return if pattern is empty
  if not pattern:
    return 0

  ## Convert last symbol to number
  last_symbol = symbol_to_number(pattern[len(pattern) - 1])
  ## recursively convert the remaining pattern to number
  prefix_value = pattern_to_number(pattern[:-1])
  ## return the number corresponding to the pattern
  return (4*prefix_value + last_symbol)

print(pattern_to_number("ACT"))

Explanation

  • In line 2, we define the symbol_to_number function, which converts the symbol to the corresponding number as defined in the requirements.
  • In line 12, we define the pattern_to_number function, which takes pattern as input.
  • In lines 14-15, we define the base case. If pattern is empty, the function will return 0.
  • In line 18, we take out the last symbol from pattern and convert it to the corresponding number using the symbol_to_number function.
  • In line 20, we add the recursive call to the function, in which we pass pattern without the final character as an argument.
  • In line 22, we return the result of the function.

Number to pattern

To compute the pattern from a number, we need to inverse the function written above. To do this, we will reverse engineer the process of the “Pattern to number” section. Here are the steps:

  1. Divide the index by 4 since we multiplied by 4 above.
  2. Convert the remainder number to the corresponding symbol.
  3. Repeat the above steps on the quotient until the base case gets met (the value of k becomes equal to 1).

Let’s take the same example from above to get a better understanding. Here, the numbers 0, 1, 2, and 3 correspond to A, C, G, and T, respectively. If we input the index as 7 and k (the k in k-mer) as 3 we will get the pattern ACT.

Code

## Convert numebr to it's corresponding symbols
def number_to_symbol(number):
  if (number == 0):
    return 'A'
  elif (number == 1):
    return 'C'
  elif (number == 2):
    return 'G'
  elif (number == 3):
    return 'T'


def number_to_pattern(index, k):
  ## return error if conversion is 
  ## not possible for the provided length
  if index >= 4**k:
    return "Index value in too large for given k"
  
  ## base case when length(k) is 1
  if k == 1:
    return number_to_symbol(index)
    
  ## Get prefix index
  prefix_index = int(index/4)
  ## Get the index of the last character
  remainder = index % 4
  ## Get last symbol
  last_symbol = number_to_symbol(remainder)
  ## recursively get the pattern without the last symbol
  prefix_pattern = number_to_pattern(prefix_index, k-1)
  
  return prefix_pattern + last_symbol 

print(number_to_pattern(7, 3))

Explanation

  • In line 2, we define the number_to_symbol function, which converts the numbers to the corresponding symbol as defined in the requirements.
  • In line 13, we define the number_to_pattern function, which takes index and k as inputs.
  • In lines 16-17, we insert a constraint such that index cannot exceed the largest possible value of k.
  • In lines 20-21, we have the base case. If the value of k is 1, it will return the number corresponding to the symbol it was provided with.
  • In line 24, we calculate the prefix_index, which is the quotient when we divide index by 4.
  • In line 26, we get the remainder from the division of index with 4.
  • In line 28, we get the last symbol by converting the remainder to its corresponding character.
  • In line 30, we add the recursive call for the prefix_index, with length k-1.
  • In line 32, we concatenate the prefix_pattern and the last_symbol and return the complete pattern.

RELATED TAGS

dna

CONTRIBUTOR

Ibrahim Khan
Copyright ©2022 Educative, Inc. All rights reserved
RELATED COURSES

View all Courses

Keep Exploring