Trusted answers to developer questions
Trusted Answers to Developer Questions

Related Tags

strings
python
levenshtein
communitycreator

How to find the edit distance between two strings

Eman Kashif

The edit distancealso known as the Levenshtein distance is the measure of how dissimilar two strings are. It counts the minimum number of operations required to turn one string into another. The following operations are taken into consideration:

  1. Insertion
  2. Deletion
  3. Substitution

For example, the edit distance between “life” and “love” would be 2 since it requires two operations of substitution.

To start with the algorithm, we first initialize an empty matrix of zeros with the width and height equal to the lengths of the two strings respectively. The first row and column are filled with numbered values to represent the placement of each character.

We run two for loops to traverse through every element of the matrix. If two letters are found to be the same, the new value at position [i, j] is set as the minimum value between position [i-1, j] + 1, position [i-1, j-1], and position [i, j-1] + 1.

If the two letters are not the same, the new value at position [i, j] is set as the minimum value between position [i-1, j] + 1, position [i-1, j-1] + 1, and position [i, j-1] + 1.

The edit distance will be the value at the lower right corner of the matrix.

import numpy as np

#initialize two strings
str1 = "lying"
print(str1)
str2 = "layout"
print(str2)

def lev(str1,str2):

    #create matrix of zeros
    len1=len(str1)+1
    len2=len(str2)+1
    matrix = np.zeros((len1,len2))

    #number x and y indices in matrix
    for i in range(len1):
        matrix[i,0] = i
    for j in range(len2):
        matrix[0,j] = j
    
    #two for loops to compare strings letter by letter
    for i in range(1, len1):
        for j in range(1, len2):
            if str1[i-1]==str2[j-1]:
                matrix[i,j] = min(
                matrix[i-1,j] + 1,
                matrix[i,j-1] + 1,
                matrix[i-1,j-1])
            else:
                matrix[i,j] = min(
                matrix[i-1,j] + 1,
                matrix[i-1,j-1] + 1,
                matrix[i,j-1] + 1)
    print(matrix)
    return(matrix[len1-1,len2-1])

m=lev(str1,str2)   
print('Levenshtein distance is:',m)

RELATED TAGS

strings
python
levenshtein
communitycreator
RELATED COURSES

View all Courses

Keep Exploring