What is the Boyer-Moore algorithm?
Overview
In bioinformatics, pattern searching is an essential problem. We search for strings in different types of sequences using such algorithms.
The Boyer-Moore algorithm is a string-searching algorithm used for finding DNA sequences from a string of DNA. It was developed in 1977 by Robert S. Boyer and J. Strother Moore.
Example
Let's take an example of a sample text and a pattern that is to be detected.
Algorithm
Let's assume the original string is
The algorithm begins the alignment at
The strings are compared from the end of
Note: To learn more about the rules, visit this Answer.
Comparisons are then performed at the new alignment, which is repeated until the alignment is shifted past the end, indicating no further matches will occur.
Code sample
This code sample uses the bad character rule.
number_of_characters = 256def bad_character_rule(string, size):#set all occurences as -1bad_character = [-1]*number_of_characters#set the value of the last occurrencefor i in range(size):bad_character[ord(string[i])] = i;return bad_characterdef search(original_string, search_string):n = len(original_string)m = len(search_string)bad_character = bad_character_rule(search_string, m)s = 0while(s <= n-m):j = m-1#reduce index of search_string while the serch_string and original_string matchwhile j>=0 and search_string[j] == original_string[s+j]:j -= 1#if search_string is found at s, set j as -1if j<0:print("search_string present at index = {}".format(s))s += (m-bad_character[ord(original_string[s+m])] if s+m<n else 1)else:s += max(1, j-bad_character[ord(original_string[s+j])])def main():original_string = "ACTGGATGT"search_string = "TG"search(original_string, search_string)if __name__ == '__main__':main()
Code explanation
In main.py:
Lines 3–11: We use the
bad_character_rulefunction to preprocess the bad character rule.Lines 13–32: We use the
searchfunction for searching the pattern that makes use of the bad character for the givensearch_string.
Conclusion
The bad character rule has the runtime
Free Resources