Rules for Boyer-Moore search algorithm

The Boyer-Moore algorithm is a string-searching algorithm. It was developed in 1977 by Robert S. Boyer and J. Strother Moore.

It consists of rules that determine the skipping of characters while matching alignments during the search process. We'll discuss these rules in this Answer.

Rules for searching

These are the following two rules for searching in the Boyer-Moore algorithm:

The bad character rule
The good suffix rule

Let $T$ be the original text string and $P$ be the pattern we want to search. The length of the original text is $n$ and the length of the pattern is $m$ .

The bad character rule

In the bad character rule, we find a character in $T$ that fails to match $P$ . The successive occurrence on the left of $P$ is found. Another shift is proposed that brings the occurrence in line with the mismatched occurrence present in $T$ . If the mismatched occurrence of the character is not found on the left of $P$ , a shift is proposed that shifts $P$ past the point of mismatch.

In a nutshell, if we mismatch a character, we use the knowledge of the mismatched text character to skip alignments.

The good suffix rule

Suppose we have a substring $t$ of $T$ . Let $t$ match a suffix of $P$ but a mismatch occurs.

The rightmost copy $t'$ of $t$ in $P$ is found such that $t'$ is not a suffix of $P$ . The character to the left of $t'$ in $P$ differs from the characters to the left of $t$ in $P$ . $P$ is shifted to the right such that $t'$ in $P$ aligns with $t$ in $T$ .

If $t'$ doesn't exist, then shift the search window towards the left end of $P$ , past the left end of $t$ in $T$ . This occurs such that the prefix of the shifted pattern matches the suffix of $t$ in $T$ .

If such a shift isn't possible, then $P$ is shifted by $m$ indexes to the right. If $P$ is found, then it shifts $P$ by at least an amount so that a proper prefix of the shifted $P$ matches a suffix of the occurrence of $P$ in $T$ . If the aforementioned shift isn't possible, then it shifts $P$ by $m$ places so that it's shift past $t$ .

In a nutshell, if we match some characters, we use knowledge of the matched characters to skip alignments.

Rules for Boyer-Moore search algorithm

Rules for searching

The bad character rule

The good suffix rule

Conclusion