Detour: Scoring Multiple Alignments

Explore a few more practical approaches for scoring alignments.

We'll cover the following

Entropy of a column

The choice of scoring function can drastically affect the quality of a multiple alignment. In the main lesson, we described a way to score t-way alignments by using a t-dimensional scoring matrix. Below, we describe more practical approaches to scoring alignments.

The columns of a t-way alignment describe a path in a t-dimensional alignment graph whose edge weights are defined by the scoring function. Using the statistically motivated entropy score, the score of a multiple alignment is defined as the sum of the entropies of its columns. Recall from Chapter 2 that the entropy of a column is equal to:

pxlog2(px),- \sum p_{x} \cdot log_{2}(p_{x}),

where the sum is taken over all symbols x present in the column, and pxp_x is the frequency of symbol x in the column.

Previously, we saw that more highly conserved columns will have lower entropy scores. Because we wish to maximize the alignment score, we use the negative of entropy in order to ensure that more highly conserved columns receive higher scores. Finding a longest path in the t-dimensional alignment graph therefore corresponds to finding a multiple alignment with minimal entropy.


Another popular scoring approach is the Sum-of-Pairs score (SP-score). A multiple alignment Alignment of t sequences induces a pairwise alignment between the i-th and j-th sequences, having score s(Alignment, i, j). The SP-score for a multiple alignment simply adds the scores of each induced pairwise alignment:

SpScore(Alignment)=1ijts(Alignment,i,j)Sp-Score(Alignment) = \sum_{1\leq i\leq j\leq t}^{} s(Alignment, i, j)

Exercise Break: Compute the entropy score and SP-score of Marahiel’s 3-way alignment, reproduced below.

Get hands-on with 1200+ tech skills courses.