Rules for Boyer-Moore search algorithm

The Boyer-Moore algorithm is a string-searching algorithm. It was developed in 1977 by Robert S. Boyer and J. Strother Moore.

It consists of rules that determine the skipping of characters while matching alignments during the search process. We'll discuss these rules in this Answer.

Rules for searching

These are the following two rules for searching in the Boyer-Moore algorithm:

  • The bad character rule

  • The good suffix rule

Let TT be the original text string and PP be the pattern we want to search. The length of the original text is nn and the length of the pattern is mm.

The bad character rule

In the bad character rule, we find a character in TT that fails to match PP. The successive occurrence on the left of PP is found. Another shift is proposed that brings the occurrence in line with the mismatched occurrence present in TT. If the mismatched occurrence of the character is not found on the left of PP, a shift is proposed that shifts PP past the point of mismatch.

In a nutshell, if we mismatch a character, we use the knowledge of the mismatched text character to skip alignments.

The bad character rule
1 of 5

The good suffix rule

Suppose we have a substring tt of TT. Let tt match a suffix of PP but a mismatch occurs.

The rightmost copy t′t' of tt in PP is found such that t′t' is not a suffix of PP. The character to the left of t′t' in PP differs from the characters to the left of tt in PP. PP is shifted to the right such that t′t' in PP aligns with tt in TT.

If t′t' doesn't exist, then shift the search window towards the left end of PP, past the left end of tt in TT. This occurs such that the prefix of the shifted pattern matches the suffix of tt in TT.

If such a shift isn't possible, then PP is shifted by mm indexes to the right. If PP is found, then it shifts PP by at least an amount so that a proper prefix of the shifted PP matches a suffix of the occurrence of PP in TT. If the aforementioned shift isn't possible, then it shifts PP by mm places so that it's shift past tt.

In a nutshell, if we match some characters, we use knowledge of the matched characters to skip alignments.

The good suffix rule
1 of 4

Conclusion

The bad character rule has the runtime O(n/m)O(n/m) in the best case and O(nm)O(nm) in the worst case. The worst case occurs when the character of the original string and the string to be found are the same. The best case occurs when all the characters of the original string and the string to be matched are different.

Copyright ©2024 Educative, Inc. All rights reserved