Project Description for Computational Biology

Get a brief introduction to "Computational Biology" and learn which features we'll be building in this project.

We'll cover the following

Introduction

Since the discovery of Deoxyribonucleic acid (DNA) and Ribonucleic acid (RNA), computer science found lots of applications in biology. An entire field called Computational Biology exists to apply computer science to biology problems.

The scenario and the problems discussed in this chapter also relate to the DNA and protein sequences that are dealt within Computational Biology.

Statement

Assume you are a developer in a biology lab. You are tasked with creating simulations for DNA transformation of different species. The biology lab wants you to write a suite of programs for processing of DNA and proteins.

We’ll start by analyzing whether it is possible to convert a DNA sequence of one species to another by changing or replacing their genes. During this mutation, some virus sequences were observed which we’ll isolate. After this, we’ll explore different proteins that can provide immunity against the identified viruses.

Features

We will need to introduce the following features to implement the functionalities we discussed above:

  • Feature #1: Determine if an unknown DNA sequence differs from a known DNA sequence by only a single gene replacement.

  • Feature #2: A new virus is known to infect species by inserting long sequences of k unique nucleotides. Given a chromosome, determine if it is potentially infected by finding the longest subsequence with k unique nucleotides.

  • Feature #3: Locate the palindrome structure in chromosomes to identify potential proteins.

  • Feature #4: Proteins are known to have palindromic sequences. Determine if a given string could be a protein.

  • Feature #5: Rearrange the nucleotides within a DNA sequence to work out the next strongest mutation and deduce the most dominant variation of the virus.

  • Feature #6: Find the unique identifier of a virus by figuring out the longest non-recurring sequence of nucleotides within a DNA sequence.

  • Feature #7: Validate whether a sequence of nucleotides is a protein or not, by working out a palindrome out of all the possible permutations of the DNA.

  • Feature #8: Compare two strands of DNA and figure out the minimum number of edit operations needed to make them identical.

  • Feature #9: The species on a planet have n distinct genes numbered 1…n. Find the kth missing​​ gene in a given DNA sequence.

Understanding these feature requests and designing their solutions will help us implement the requested functionality into the computational biology software suite.

In the next few lessons, we’ll discuss the recommended implementations of these features. The solutions to these will also be applicable to other common coding interview questions.

Level up your interview prep. Join Educative to access 70+ hands-on prep courses.