Trusted answers to developer questions

Educative Answers Team

Grokking Modern System Design Interview for Engineers & Managers

Ace your System Design Interview and take your career to the next level. Learn to handle the design of applications like Netflix, Quora, Facebook, Uber, and many more in a 45-min interview. Learn the RESHADED framework for architecting web-scale applications by determining requirements, constraints, and assumptions before diving into a step-by-step design process.

The **Knuth-Morris-Pratt (KMP) algorithm** is an algorithm that is used to search for a substring (`W`

), in a given string (`S`

), in $O(m+n)$ time (where $m$ and $n$ are the lengths of `W`

and `S`

).

The key idea used to achieve this time complexity is to minimize the amount of backtracking when a character of `W`

does not match with that of `S`

. This can only be done if we know two things:

- Whether or not a proper prefix of
`W`

occurs more than once in`S`

â€‹after at least one character has been correctly found; if it does, it can be skipped when resuming the process of matching after a mismatch. - Length of the proper prefix.

1 character matched

Before starting the actual algorithm, a one-dimensional array is initialized with the number of characters that can be skipped after a mismatch. `arr[i]`

represents the number of characters that can be skipped when `W[i+1]`

does not match with a character in `S`

.

In the example above, `W[4]`

(i.e., `a`

) did not match with the character `c`

in `S`

(slide 5), so, the matching was resumed from `W[1]`

instead of `W[0]`

(slide 6). This means that character *1* was skipped; therefore, * arr[4-1] = 1*.

`arr`

is initialized using the failure function $f(i)$. This function is based on the fact that:

*When a mismatch occurs, all the previous characters match correctly; â€‹this implies that if a prefix of W occurred in this set of matching characters, then that prefix is also a suffix of W*.

In other words, `arr[i]`

will represent the length of the longest proper prefix of `W`

, which is also a proper suffix of `W`

(considering `W`

till the `i`

th index only).

The following is the code used to initialize `arr`

:

```
m = len(W)
i = 0
j = 1
# No proper prefix for string of length 1:
arr[0] = 0
while j < m:
if W[i] == W[j]:
i++
arr[j] = i
j++
# The first character didn't match:
else if i == 0:
arr[j] = 0
j++
# Mismatch after at least one matching character:
else:
i = arr[i - 1]
```

Initialization

The following is the KMP algorithm pseudocode used for searching substring `W`

, in string `S`

, in $O(m+n)$:

```
# Initialising variables:
i = 0
j = 0
m = len(W)
n = len(S)
# Start search:
while (i < m && j < n){
# Last character matches -> Substring found:
if (W[i] == S[j] && i == m - 1):
return true
# Character matches:
else if (W[i] == S[j]):
i++
j++
# Character didn't match -> Backtrack:
else:
i = arr[i - 1]
if i == 0:
j++
}
# Substring not found:
return false
```

The algorithm above (given in the form of a pseudocode) along with the failure function, has been implemented in Python:

# Initialise array based on the failure function:def init_arr(w):m = len(w)i = 0j = 1# No proper prefix for string of length 1:global arrarr[0] = 0while j < m:if w[i] == w[j]:i += 1arr[j] = ij += 1# The first character didn't match:elif i == 0:arr[j] = 0j += 1# Mismatch after at least one matching character:else:i = arr[i - 1]def kmp_search(w, s):# Initialise array:init_arr(w)# Initialising variables:i = 0j = 0m = len(w)n = len(s)# Start search:while i < m and j < n:# Last character matches -> Substring found:if w[i] == s[j] and i == m - 1:return True# Character matches:elif w[i] == s[j]:i += 1j += 1# Character didn't match -> Backtrack:else:if i != 0:i = arr[i-1]else:j += 1# Substring not found:return Falsetext = "hayhello"substring = "hell"# create array:arr = [None] * len(substring)if kmp_search(substring, text):print("Substring found!")else:print("Could not find substring.")

RELATED TAGS

kmp

substring

search

algorithm

prefix

Copyright Â©2022 Educative, Inc. All rights reserved

Grokking Modern System Design Interview for Engineers & Managers

Ace your System Design Interview and take your career to the next level. Learn to handle the design of applications like Netflix, Quora, Facebook, Uber, and many more in a 45-min interview. Learn the RESHADED framework for architecting web-scale applications by determining requirements, constraints, and assumptions before diving into a step-by-step design process.

Keep Exploring

Related Courses