Trusted answers to developer questions
Trusted Answers to Developer Questions

Related Tags

kmp
substring
search
algorithm
prefix

# What is the Knuth-Morris-Pratt algorithm? Educative Answers Team

Grokking Modern System Design Interview for Engineers & Managers

Ace your System Design Interview and take your career to the next level. Learn to handle the design of applications like Netflix, Quora, Facebook, Uber, and many more in a 45-min interview. Learn the RESHADED framework for architecting web-scale applications by determining requirements, constraints, and assumptions before diving into a step-by-step design process.

The Knuth-Morris-Pratt (KMP) algorithm is an algorithm that is used to search for a substring (W), in a given string (S), in $O(m+n)$ time (where $m$ and $n$ are the lengths of W and S).

## Key idea

The key idea used to achieve this time complexity is to minimize the amount of backtracking when a character of W does not match with that of S. This can only be done if we know two things:

1. Whether or not a proper prefix of W occurs more than once in S ​after at least one character has been correctly found; if it does, it can be skipped when resuming the process of matching after a mismatch.
2. Length of the proper prefix. 1 character matched
1 of 7

## Using an array

Before starting the actual algorithm, a one-dimensional array is initialized with the number of characters that can be skipped after a mismatch. arr[i] represents the number of characters that can be skipped when W[i+1] does not match with a character in S.

In the example above, W (i.e., a) did not match with the character c in S (slide 5), so, the matching was resumed from W instead of W (slide 6). This means that character 1 was skipped; therefore, arr[4-1] = 1.

## Failure function

arr is initialized using the failure function $f(i)$. This function is based on the fact that:

When a mismatch occurs, all the previous characters match correctly; ​this implies that if a prefix of W occurred in this set of matching characters, then that prefix is also a suffix of W.

In other words, arr[i] will represent the length of the longest proper prefix of W, which is also a proper suffix of W (considering W till the ith index only).

The following is the code used to initialize arr:

m = len(W)
i = 0
j = 1

# No proper prefix for string of length 1:
arr = 0

while j < m:
if W[i] == W[j]:
i++
arr[j] = i
j++
# The first character didn't match:
else if i == 0:
arr[j] = 0
j++
# Mismatch after at least one matching character:
else:
i = arr[i - 1] Initialization
1 of 16

## Algorithm

The following is the KMP algorithm pseudocode used for searching substring W, in string S, in $O(m+n)$:

# Initialising variables:
i = 0
j = 0
m = len(W)
n = len(S)

# Start search:
while (i < m && j < n){
# Last character matches -> Substring found:
if (W[i] == S[j] && i == m - 1):
return true

# Character matches:
else if (W[i] == S[j]):
i++
j++

# Character didn't match -> Backtrack:
else:
i = arr[i - 1]
if i == 0:
j++
}

return false


## Implementation

The algorithm above (given in the form of a pseudocode) along with the failure function, has been implemented in Python:

# Initialise array based on the failure function:def init_arr(w):    m = len(w)    i = 0    j = 1    # No proper prefix for string of length 1:    global arr    arr = 0    while j < m:        if w[i] == w[j]:            i += 1            arr[j] = i            j += 1        # The first character didn't match:        elif i == 0:            arr[j] = 0            j += 1        # Mismatch after at least one matching character:        else:            i = arr[i - 1]def kmp_search(w, s):    # Initialise array:    init_arr(w)    # Initialising variables:    i = 0    j = 0    m = len(w)    n = len(s)    # Start search:    while i < m and j < n:        # Last character matches -> Substring found:        if w[i] == s[j] and i == m - 1:            return True        # Character matches:        elif w[i] == s[j]:            i += 1            j += 1        # Character didn't match -> Backtrack:        else:            if i != 0:                i = arr[i-1]             else:                j += 1    # Substring not found:    return Falsetext = "hayhello"substring = "hell"# create array:arr = [None] * len(substring)if kmp_search(substring, text):    print("Substring found!")else:    print("Could not find substring.")

RELATED TAGS

kmp
substring
search
algorithm
prefix 