Distance Calculations in Fuzzy
Understand the inner workings of the FuzzyWuzzy Python package and how its methods can accomplish different string-matching goals.
What is FuzzyWuzzy?
The FuzzyWuzzy (now TheFuzz) Python package is a string-matching library based on Levenshtein distance algorithms that use fuzzy string-matching techniques to compare and match strings.
Instead of returning an edit distance, Fuzz returns a similarity ratio, built using a combination of edit distance and the lengths of the strings themselves. This is done in the form of several functions to compare two strings, including ratio()
, partial_ratio()
, token_sort_ratio()
, token_set_ratio()
, and others. These functions use slightly different techniques to compare strings and return a similarity ratio score between 0 and 100, indicating how closely the strings match, with 100 being a perfect match.
Character ratios
Character ratios, like simple ratio and partial ratio, are meant to operate on individual sets of characters and single words.
Simple ratio
The Fuzzy simple ratio is calculated by taking the number of matches between two strings (using the edit distance formula), multiplying that number by 2, and dividing by the length of both strings summed together
Here is an example of how to calculate the Fuzzy simple ratio between two strings, "apple" and "banana":
Get hands-on with 1400+ tech skills courses.