Andrew Viterbi proposed the Viterbi algorithm in 1967. The Viterbi algorithm decodes convolution codes over noisy digital communication links and is used in various fields, including information theory, bioinformatics, speech processing, computational linguistics, and communication systems.
Given a sequence of the observed states
Here‘s a quick recap of some of the terminologies used in HMM models:
The start state represents the initial state of the model.
Observed or emitted states are each hidden state’s measurable or observable outcomes.
The hidden state represents a system’s unobservable or latent state.
Start probabilities represent the initial probabilities of being in each hidden state at the start of the sequence.
Transition probabilities describe the likelihood of transitioning from one hidden state to another.
The transition matrix (A) holds all transition probabilities between hidden states.
Emission probabilities define the probability distribution of observing a particular state given the current hidden state.
The emission matrix (B) captures all emission probabilities of each hidden state.
Suppose there are three markets: Food street, clothes market, and electronics market. Initially, we can move to any market with some initial probability. For this example, we assume these initial probabilities are equal. Assume the following transition matrix from one market to another market:
Transition Matrix (A) | Food Street | Clothes Market | Electronics Market |
Initial State | 0.33 | 0.33 | 0.33 |
Food Street | 0.1 | 0.6 | 0.3 |
Clothes Market | 0.45 | 0.15 | 0.4 |
Electronics Market | 0.4 | 0.4 | 0.2 |
Each market has different shops in it. Let’s consider the shop ratio for each market is as follows:
Food street: 70% food shops, 20% clothing shops, and 10% electronics shops
Clothes market: 65% clothing shops, 25% food shops, and 10% electronics shops
Electronics market: 75% electronic shops, 15% food shops, and 10% clothing shops
This builds the following emission matrix:
Emission Matrix (B) | Food | Clothes | Electronics |
Food Street | 0.7 | 0.2 | 0.1 |
Clothes Market | 0.25 | 0.65 | 0.1 |
Electronics Market | 0.15 | 0.1 | 0.75 |
By now, we’ve constructed a simple HMM that we can see in the following illustration:
The Viterbi algorithm uses the transition and emission probabilities from the HMM to find the maximal probabilities matrix (
Consider the observed state sequence as
In this step, we initialize and fill the first columns of matrices
For each possible state
For matrix
In the backward pass, we’ll use the matrix
Let’s understand this with the following slides:
For
Let’s implement the Viterbi algorithm in the following playground:
import numpy as npdef ViterbiAlgorithm(hmm, observations):# Initialization stepmatrixC = np.zeros((len(observations), len(hmm['states'])))obs_ind = hmm['emissions'].index(observations[0])matrixC[0] = hmm['initial_probs'] * hmm['matrixB'][:, obs_ind]matrixD = np.ones((len(observations), len(hmm['states']))) * -1# Forward passfor curr_obs in range(1, len(observations)):obs_ind = hmm['emissions'].index(observations[curr_obs])for curr_stat in range(len(hmm['states'])):curr_stat_probs = matrixC[curr_obs-1] * hmm['matrixA'][:, curr_stat] * hmm['matrixB'][curr_stat, obs_ind]matrixC[curr_obs, curr_stat] = max(curr_stat_probs)matrixD[curr_obs, curr_stat] = np.argmax(curr_stat_probs)# Backward passoutput = []state_ind = np.argmax(matrixC[-1])for curr_col in range(len(observations)-1, -1, -1):output.insert(0, hmm['states'][state_ind])state_ind = int(matrixD[curr_col][state_ind])return outputhmm = {'states': ['Food Street', 'Clothes Market', 'Electronics Market'],'initial_probs': np.array([0.33, 0.33, 0.33]),'matrixA': np.array([[0.1, 0.6, 0.3],[0.45, 0.15, 0.4],[0.4, 0.4, 0.2]]),'emissions': ['Food', 'Clothes', 'Electronics'],'matrixB': np.array([[0.7, 0.2, 0.1],[0.25, 0.65, 0.1],[0.15, 0.1, 0.75]])}observations = ['Clothes', 'Food', 'Clothes', 'Electronics']output_seq = ViterbiAlgorithm(hmm, observations)print('Observations --> Most likely state')print('----------------------------------')for obs, state in zip(observations, output_seq):print(obs, ' --> ', state)
Let’s explain the code above in detail:
Lines 3–25: We define the ViterbiAlgorithm()
function that receives two parameters. The hmm
dictionary has states, initial probabilities, transition matrix, emissions states, and emission matrix. The observation
list contains the observed state sequences. Inside this function:
Lines 5–8: We initialize the matrixC
as a NumPy array of dimensions of observations and states to store maximal probabilities. We store the index of the first observation in the emissions
list in obs_ind
. We multiply the initial probabilities with the emission probabilities of the first observation state and updated the first column of matrixC
. We also initialize the matrixD
to store the state transitions.
Lines 11–16: We implement the forward pass to fill the matrixC
and matrixD
. We start the outer loop over observations. For each observation, we use a nested loop to calculate the maximal probability of the current observation for each state. To calculate the maximal probability, we multiply the probabilities of the previous column of matrixC
, the transition probability from the previous states to the current state, and the emission probability of the current observation on the current state. We store all these probabilities in curr_stat_probs
and call the max()
function to update the maximal probability of the current state in matrixC
. Lastly, we call the np.argmax()
method to store the state transition in matrixD
.
Lines 19–23: We define the output
list to store the most likely sequence of states. We store the index of the maximum probability of the last column of matrixC
in the state_ind
. We started the backward loop over observations
. Inside this function, we add the state to the start of the output
list and access the current state cell of the current observation in matrixD
to update the state_ind
for the previous observation.
Line 25: Lastly, we return the output
list.
Lines 28–39: We define a hmm
dictionary with the information required for the Viterbi algorithm.
Line 40: We define the observation
list with the observation states.
Line 41: We call the ViterbiAlgorithm()
function and store the output in the output_seq
list.
Lines 42–45: We use the for
loop to show the output in the format where for each observation, the most likely state is printed in front of the observation.
For each hidden state for one observation, we need to calculate the probability of every path coming from the hidden states of the previous observation. If there are
We can utilize the Viterbi algorithm in various domains. Let’s have a look over some real-life applications of the Viterbi algorithm.
Handwriting recognition: We can use the Viterbi algorithm to decode the sequences of observed pixel values into the most likely sequence of characters to recognize the handwriting. We can use different characters as states and pixel values as observations.
DNA sequence analysis: We can identify the most probable sequence of the nucleotides to observe the DNA sequence using the Viterbi algorithm. The nucleotides (A, T, C, G) can be treated as states, and the DNA sequence can be the observations.
Part-of-speech tagging: The Viterbi algorithm can be utilized to predict the POS for each word of the sentence. The POS categories (e.g., noun, verb, and others) are the states, and the words of the sentence are the observations.
Gesture recognition: We can employ the Viterbi algorithm to recognize the most likely sequence of gestures given the observed sensor data. The gesture types (e.g., tap, pinch, and swipe) are the states, and the sequence of sensor data is the observations.
Electrocardiogram (ECG) analysis: The Viterbi algorithm can assist in diagnosing cardiac conditions by identifying the ECG signal. The ECG signal can be used as observation, and the cardiac conditions are the states of the model.
The Viterbi algorithm is a powerful tool for different kinds of applications, but still, we need to consider the following limitations before using it:
Computational complexity: The time complexity of this algorithm is dependent on the number of hidden states and the length of the observation sequence. For larger sequences or a higher number of hidden states, it requires more computational powers.
Memory usage: Storing matrixes
Limited to one most likely path: The Viterbi algorithm only finds one most likely sequence of hidden states. There might be some other possible sequences with a lower probability but also considerable. If we are exploring more than one path in our problem, this can be a limitation.
Unknown elements: The Viterbi algorithm only deals with the known elements. For unknown elements, the behavior of the Viterbi algorithm is unexpected.
Consider the following matrices:
What is the most likely state sequence output for the above matrices $C$ and $D$ of observation [Food, Food, Electronics, Clothes]
?
[Food Street, Food Street, Electronics Market, Clothes Market]
[Food Street, Clothes Market, Electronics Market, Clothes Market]
[Food Street, Clothes Market, Clothes Market, Electronics Market]
[Electronics Market, Food Street, Electronics Market, Clothes Market]