Project 1: Fun with DNA (REGEX Lookaround)!

In this project we find Opening reading frame or ORF from DNA sequences with the help of Python regex.

DNA is a sequence of bases, A, C, G, or T. They are translated into proteins 3-bases where each sequence is called a codon. There is a special start codon ATG, and three stop codons, TGA, TAG, and TAA. Example:

cgcgcATGcATGcgTGAcTAAcgTAGcgcgcgcgc

An opening reading frame or ORF consists of a start codon, followed by some more codons, and ending with a stop codon. The above example has overlapping ORFs.

  • ATGcATGcgTGA and
  • ATGcgTGAcTAA.

The following pattern only finds the first ORF (atgcatgcgtga'). Since it consumes the first ORF, it also consumes the beginning of the second ORF.

Get hands-on with 1200+ tech skills courses.