Related Tags

kmp
substring
search
algorithm
prefix

# What is the Knuth-Morris-Pratt algorithm?

Ace your System Design Interview and take your career to the next level. Learn to handle the design of applications like Netflix, Quora, Facebook, Uber, and many more in a 45-min interview. Learn the RESHADED framework for architecting web-scale applications by determining requirements, constraints, and assumptions before diving into a step-by-step design process.

The Knuth-Morris-Pratt (KMP) algorithm is an algorithm that is used to search for a substring (W), in a given string (S), in $O(m+n)$ time (where $m$ and $n$ are the lengths of W and S).

## Key idea

The key idea used to achieve this time complexity is to minimize the amount of backtracking when a character of W does not match with that of S. This can only be done if we know two things:

1. Whether or not a proper prefix of W occurs more than once in S ​after at least one character has been correctly found; if it does, it can be skipped when resuming the process of matching after a mismatch.
2. Length of the proper prefix.
1 character matched
1 of 7

## Using an array

Before starting the actual algorithm, a one-dimensional array is initialized with the number of characters that can be skipped after a mismatch. arr[i] represents the number of characters that can be skipped when W[i+1] does not match with a character in S.

In the example above, W[4] (i.e., a) did not match with the character c in S (slide 5), so, the matching was resumed from W[1] instead of W[0] (slide 6). This means that character 1 was skipped; therefore, arr[4-1] = 1.

## Failure function

arr is initialized using the failure function $f(i)$. This function is based on the fact that:

When a mismatch occurs, all the previous characters match correctly; ​this implies that if a prefix of W occurred in this set of matching characters, then that prefix is also a suffix of W.

In other words, arr[i] will represent the length of the longest proper prefix of W, which is also a proper suffix of W (considering W till the ith index only).

The following is the code used to initialize arr:

m = len(W)
i = 0
j = 1

# No proper prefix for string of length 1:
arr[0] = 0

while j < m:
if W[i] == W[j]:
i++
arr[j] = i
j++
# The first character didn't match:
else if i == 0:
arr[j] = 0
j++
# Mismatch after at least one matching character:
else:
i = arr[i - 1]

Initialization
1 of 16

## Algorithm

The following is the KMP algorithm pseudocode used for searching substring W, in string S, in $O(m+n)$:

# Initialising variables:
i = 0
j = 0
m = len(W)
n = len(S)

# Start search:
while (i < m && j < n){
# Last character matches -> Substring found:
if (W[i] == S[j] && i == m - 1):
return true

# Character matches:
else if (W[i] == S[j]):
i++
j++

# Character didn't match -> Backtrack:
else:
i = arr[i - 1]
if i == 0:
j++
}

return false


## Implementation

The algorithm above (given in the form of a pseudocode) along with the failure function, has been implemented in Python:

# Initialise array based on the failure function:def init_arr(w):    m = len(w)    i = 0    j = 1    # No proper prefix for string of length 1:    global arr    arr[0] = 0    while j < m:        if w[i] == w[j]:            i += 1            arr[j] = i            j += 1        # The first character didn't match:        elif i == 0:            arr[j] = 0            j += 1        # Mismatch after at least one matching character:        else:            i = arr[i - 1]def kmp_search(w, s):    # Initialise array:    init_arr(w)    # Initialising variables:    i = 0    j = 0    m = len(w)    n = len(s)    # Start search:    while i < m and j < n:        # Last character matches -> Substring found:        if w[i] == s[j] and i == m - 1:            return True        # Character matches:        elif w[i] == s[j]:            i += 1            j += 1        # Character didn't match -> Backtrack:        else:            if i != 0:                i = arr[i-1]             else:                j += 1    # Substring not found:    return Falsetext = "hayhello"substring = "hell"# create array:arr = [None] * len(substring)if kmp_search(substring, text):    print("Substring found!")else:    print("Could not find substring.")

RELATED TAGS

kmp
substring
search
algorithm
prefix