a shot of dev knowledge

RELATED TAGS

# What is the Knuth-Morris-Pratt algorithm?

The Knuth-Morris-Pratt (KMP) algorithm is an algorithm that is used to search for a substring (W), in a given string (S), in $O(m+n)$ time (where $m$ and $n$ are the lengths of W and S).

## Key idea

The key idea used to achieve this time complexity is to minimize the amount of backtracking when a character of W does not match with that of S. This can only be done if we know two things:

1. Whether or not a proper prefix of W occurs more than once in S ​after at least one character has been correctly found; if it does, it can be skipped when resuming the process of matching after a mismatch.
2. Length of the proper prefix.
1 character matched
1 of 7

## Using an array

Before starting the actual algorithm, a one-dimensional array is initialized with the number of characters that can be skipped after a mismatch. arr[i] represents the number of characters that can be skipped when W[i+1] does not match with a character in S.

In the example above, W[4] (i.e., a) did not match with the character c in S (slide 5), so, the matching was resumed from W[1] instead of W[0] (slide 6). This means that character 1 was skipped; therefore, arr[4-1] = 1.

## Failure function

arr is initialized using the failure function $f(i)$. This function is based on the fact that:

When a mismatch occurs, all the previous characters match correctly; ​this implies that if a prefix of W occurred in this set of matching characters, then that prefix is also a suffix of W.

In other words, arr[i] will represent the length of the longest proper prefix of W, which is also a proper suffix of W (considering W till the ith index only).

The following is the code used to initialize arr:

m = len(W)
i = 0
j = 1

# No proper prefix for string of length 1:
arr[0] = 0

while j < m:
if W[i] == W[j]:
i++
arr[j] = i
j++
# The first character didn't match:
else if i == 0:
arr[j] = 0
j++
# Mismatch after at least one matching character:
else:
i = arr[i - 1]

Initialization
1 of 16

## Algorithm

The following is the KMP algorithm pseudocode used for searching substring W, in string S, in $O(m+n)$:

# Initialising variables:
i = 0
j = 0
m = len(W)
n = len(S)

# Start search:
while (i < m && j < n){
# Last character matches -> Substring found:
if (W[i] == S[j] && i == m - 1):
return true

# Character matches:
else if (W[i] == S[j]):
i++
j++

# Character didn't match -> Backtrack:
else:
i = arr[i - 1]
if i == 0:
j++
}

return false


## Implementation

The algorithm above (given in the form of a pseudocode) along with the failure function, has been implemented in Python:

# Initialise array based on the failure function:
def init_arr(w):
m = len(w)
i = 0
j = 1

# No proper prefix for string of length 1:
global arr
arr[0] = 0

while j < m:
if w[i] == w[j]:
i += 1
arr[j] = i
j += 1

# The first character didn't match:
elif i == 0:
arr[j] = 0
j += 1

# Mismatch after at least one matching character:
else:
i = arr[i - 1]

def kmp_search(w, s):
# Initialise array:
init_arr(w)

# Initialising variables:
i = 0
j = 0
m = len(w)
n = len(s)

# Start search:
while i < m and j < n:
# Last character matches -> Substring found:
if w[i] == s[j] and i == m - 1:
return True

# Character matches:
elif w[i] == s[j]:
i += 1
j += 1

# Character didn't match -> Backtrack:
else:
if i != 0:
i = arr[i-1]
else:
j += 1

return False

text = "hayhello"
substring = "hell"

# create array:
arr = [None] * len(substring)

if kmp_search(substring, text):
print("Substring found!")
else:
print("Could not find substring.")

RELATED TAGS

RELATED COURSES

View all Courses

Keep Exploring

Learn in-demand tech skills in half the time