What is the Needleman-Wunsch algorithm?

In bioinformatics, understanding how genetic sequences align is crucial. One essential tool for this is the Needleman-Wunsch algorithm, developed by Saul B. Needleman and Christian D. Wunsch in 1970. It is used to align protein or nucleotide sequences.

Let’s look into what sequence alignment is and the different ways to align protein sequences.

Sequence alignment

Sequence alignment involves arranging genetic sequences to identify regions of similarity or divergence. This process aids in inferring evolutionary relationships, unraveling structural and functional motifsA recurring pattern that signifies functional, structural, or evolutionary significance., and detecting genetic mutations.

Types of sequence alignment

  1. Global alignment: Global alignment seeks to align entire sequences from start to end, emphasizing similarities across their entire length. It is akin to fitting puzzle pieces together, ensuring that every element of both sequences is accounted for in the alignment.

Example of global alignment
Example of global alignment

  1. Local alignment: Local alignment focuses on identifying short, conserved regions within sequences. It is particularly useful for identifying functional domains, evolutionary motifs, or regions of similarity amidst larger sequences.

Example of local alignment
Example of local alignment

Along with aligning our sequences, we need to define a scoring system that tells us whether the alignment is optimal or not.

Scoring system

In sequence alignment, a scoring system is employed to quantify the quality of alignments. Typically, matches are assigned positive scores, mismatches are assigned negative scores, and gaps incur penalties. The choice of scoring scheme depends on the biological context and the nature of the sequences being aligned.

Needleman-Wunsch algorithm

The Needleman-Wunsch algorithm addresses the fundamental problem of sequence alignment, which involves arranging genetic sequences to highlight their similarities and differences. The primary purpose of the Needleman-Wunsch algorithm is to find the optimal alignment between two sequences, maximizing the number of matched characters while penalizing gaps and mismatches.

Algorithm

The steps of the algorithm are as follows:

Initialization:

  1. Create an alignment matrixAn alignment matrix is a table used to visualize the alignment of two sequences, showing the similarities and differences between them. with dimensions (n+1) x (m+1), where n and m are the lengths of the two sequences to be aligned.

  2. Initialize the first row and column of the matrix with gap penaltiesGap penalties are costs assigned for inserting or deleting nucleotides in sequence alignment.. The value in each cell represents the score of the best alignment up to that point.

Scoring scheme:

  1. Define a scoring scheme that assigns scores to matches, mismatches, and gaps. Typically, matches receive positive scores, mismatches receive negative scores, and gap incur penalties.

Matrix filling:

  1. Iterate through each cell of the alignment matrix, starting from the top-left corner.

  2. Compute the score of each cell based on three possible moves: diagonal (match/mismatch), horizontal (gap in the first sequence), and vertical (gap in the second sequence).

  3. Calculate the score of each cell by considering the maximum score among the three possible moves, plus the corresponding score from the scoring scheme.

Traceback:

  1. Once the matrix is filled, trace back through the matrix to determine the optimal alignment pathThe optimal alignment path is the highest-scoring path through an alignment matrix, representing the most likely alignment between two sequences..

  2. Start from the bottom-right corner of the matrix and follow the highest-scoring path back to the top-left corner.

  3. At each step, decide the direction of the move (diagonal, horizontal, or vertical) based on the scores of neighboring cells.

  4. Record the aligned characters or gaps as you backtrack through the matrix.

Final alignment:

  1. The path traced during the traceback process represents the optimal alignment between the two sequences.

  2. Extract the aligned characters or gaps from the traceback path to obtain the final aligned sequences.

Let's look at how the algorithm can be applied through the help of an example!

Example
Example
1 of 7

Python code

To implement the Needleman-Wunsch algorithm in Python, we will install a module called minineedle.

pip install minineedle
Command to install minineedle module

After installing, all we need to do is import the module into our code, as well as the necessary classes.

from minineedle import needle, smith, core
Importing the library

The needle class contains the implementation of the Needleman-Wunsch algorithm for global sequence alignment, the smith class contains the implementation of the smith-waterman algorithm for local sequence alignments and the core class contains helper functions.

Let's use the library to align the sequences in our example:

from minineedle import needle, smith, core
sequence1 = "ATCG"
sequence2 = "ACG"
alignment = needle.NeedlemanWunsch(sequence1, sequence2)
alignment.change_matrix(core.ScoreMatrix(match=1, miss=-1, gap=-2))
print(alignment)
print("Score: ", alignment.get_score())
print("Alignment matrix: ", alignment.get_almatrix())

Let's see what each line of code above is doing:

  • Line 1: We import the necessary modules (needle, smith, core) from the minineedle package. These modules contain classes and functions related to sequence alignment algorithms and scoring matrices.

  • Lines 3–4: We define the two sequences that need to be aligned.

  • Line 6: We create an instance of the NeedlemanWunsch class from the needle module. The NeedlemanWunsch class is used to perform the Needleman-Wunsch algorithm for global sequence alignment. It takes the two sequences (sequence1 and sequence2) as arguments.

  • Line 8: We specify the scoring scheme to be used for the alignment. It changes the scoring matrix of the alignment object (alignment) to a ScoreMatrix object defined in the core module. The ScoreMatrix object is initialized with the specified match score (1), mismatch score (-1), and gap penalty (-2).

  • Lines 10–12: We print aligned sequences, the alignment score, and the alignment matrix.

Test your knowledge!

Do the short quiz below to test the concepts learned above.

Choose the correct answer.

1

In the Needleman-Wunsch algorithm, what does each cell in the alignment matrix represent?

A)

The optimal alignment score up to that point.

B)

The score of a specific alignment move.

C)

The difference between two characters in the sequences.

D)

None of the above.

Question 1 of 30 attempted
Copyright ©2024 Educative, Inc. All rights reserved