What Is Edit Distance?
Learn how edit distance is used to measure string similarity and facilitates tasks like spell checking.
We'll cover the following
Edit distance (also known as Levenshtein distance) measures the similarity between two strings of characters. It is a widely used concept in natural language processing and computational linguistics to quantify the amount of difference between two words.
Edit distance, at a high level, is calculated by counting the minimum number of operations required to transform one string into another. These operations can be an insertion, deletion, or substitution of a single character in either string. For example, the edit distance between "cat" and "pat" is 1, because the only operation required to transform "cat" into "pat" is to substitute the letter "c" with the letter "p".
Similarly, the distance between "am" and "aim" is 1 with an insertion operation between "a" and "m".
How is edit distance relevant to spell check?
Edit distance plays a vital role in a spellchecker. The spellchecker compares a misspelled word with a dictionary of correct words, allowing the spellchecker to suggest words with the lowest edit distance value. Because of this, in NLP applications, we always look for the minimum edit distance.
In the example below, we can see a theoretical edit distance graph from "Monk" to "Money". First, we can perform an addition of "Y", and then finally, we can perform a substitution of "K" to "E". Because we need 2 operations, the edit distance of the diagram is 2.
Other applications of edit distance in NLP
Edit distance is also used in machine translation, where it can be used to align sentences in the source and target languages (e.g., English as a source and Spanish as a target). In this case, the edit distance between two sentences can help determine the best translation for a given sentence. By identifying the sentence in the target language with the lowest edit distance to the source sentence, the machine translation system can generate more accurate translations.
Quick quiz on edit distance
What is the minimum edit distance Between “Monkey” and “Money”?
1
2
3