Trusted answers to developer questions

Asmat Batool

Multiple algorithms are used in fuzzy search systems. Some of the most common algorithms are the following:

- Hamming distance
- Levenshtein/edit distance
- Damerau–Levenshtein distance

**Hamming distance** is an algorithm to find the distance of the actual user query (string) from strings of length equal to the actual query length. It is measured by the number of substitute operations required to convert the actual query to another string of equal length.

For example, to convert "burn" into "turn", we need to substitute 't' for 'b'. Other characters in both strings are the same. As we substituted only one character, the hamming distance here is 1. Similarly, the hamming distance between "help" and "heal" is calculated to be 2.

**Levenshtein/edit distance** is used to find the distance between two strings of unequal length. It is measured by the number of insertion, deletion, or substitution operations required to convert the actual query (string) to another string.

For example, the edit distance between "close" and "open" is 4 . There are 2 remove operations for 'c' and 'l', one substitute operation where we put 'p' in place of 's', and one insertion operation to add 'n' at the end of the string.

**Damerau–Levenshtein distance** is an extended form of Levenshtein distance, where we also include transposition operation. It is measured by the number of insertions, deletions, substitutions, and transpositions needed to convert the actual query to another string.

For example, to convert "educative" to "activate", we need to do the following:

- Remove the first three characters 'e', 'd', and 'u' from "educative". After removal, the string is now "cative".
- Swap 'a' and 'c', and the string becomes "active".
- Substitute 'a' in place of 'e'. The string becomes "activa".
- Insert two characters 't' and 'e' at the end of the string to finally get the "activate".

We performed 3 delete operations, 1 transposition, 1 substitution, and 2 insertions. The total number of operations is 7, which is the Damerau–Levenshtein distance between "educative" and "activate".

RELATED TAGS

search

fuzzy search

fuzzy algorithms

CONTRIBUTOR

Asmat Batool

RELATED COURSES

View all Courses

Keep Exploring

Related Courses