Introduction to Prefix Searching Using Tries

What is a prefix?

As we saw before, a prefix string is a substring of a string or word present at the beginning.
For example, in the word holiday, the valid prefixes are h, ho, hol, holi, holid, holida, and holiday.

What is prefix searching?

Searching for the presence of a prefix in a string is called prefix searching. Problems requiring a prefix search are very common.

For a general use case, one can traverse the string to check if the prefix is present. For example, to check if holi is a valid prefix for the word holiday, one can traverse the word holiday and match each character one by one to validate if the prefix is present.

But this can be problematic if there's a need to search for prefixes in multiple words. One has to go through all the strings one by one. In cases like these, tries can be very helpful. With a trie, we can search for prefixes of multiple words in a single traversal.

%0 node_1 root node_2 b node_1->node_2 node_1671243296960 a node_2->node_1671243296960 node_1671243302687 t node_1671243296960->node_1671243302687 node_1671243325215 l node_1671243296960->node_1671243325215 node_1671243338474 n node_1671243296960->node_1671243338474 node_1671243352241 i node_1671243296960->node_1671243352241 node_1671243329057 l node_1671243325215->node_1671243329057 node_1671243360346 t node_1671243352241->node_1671243360346
A trie with words "bat", "ball", "ban", and "bait"

Trie traversal

Trie is an N-ary tree and is generally traversed downward from the root node to the leaf node. The tree is traversed for each character in the provided word for prefix-related searches.

Instead of comparing each word separately for the prefix, we create a trie of all the words. For example, in the visualization below, we're traversing for the prefix bak in the trie. All the words with the matching prefixes are identified in a single traversal in the case of a trie instead of performing multiple word traversals and comparisons, as in the brute-force approach.

End of word

While searching for a word in a trie, we need to determine if the word is present as a complete word or as a prefix of another word or words. We can make this distinction by adding a new boolean parameter, isEndOfWord, in each trie node. This flag's value is updated as true for the trie nodes associated with every word's last character.

%0 node_1 root node_2 b (isEndOfWord = false) node_1->node_2 node_1671243296960 a (isEndOfWord = false) node_2->node_1671243296960 node_1671243302687 t (isEndOfWord = true) node_1671243296960->node_1671243302687 node_1671243325215 l (isEndOfWord = false) node_1671243296960->node_1671243325215 node_1671243338474 n (isEndOfWord = true) node_1671243296960->node_1671243338474 node_1671243352241 i (isEndOfWord = false) node_1671243296960->node_1671243352241 node_1671243329057 l (isEndOfWord = true) node_1671243325215->node_1671243329057 node_1671243360346 t (isEndOfWord = true) node_1671243352241->node_1671243360346
Trie with isEndOfWord flag

Pseudocode

Trie node

A trie node would contain the following parameters:

  • An array of pointers to the child nodes.

  • A boolean parameter denoting the end of the word.

Press + to interact
// Trie node class
class TrieNode
{
// An array of pointer to the child nodes.
// The maximum number of children for a trie node
// can be equal to the language's alphabet size.
TrieNode children[ALPHABET_SIZE]
// isEndOfWord is true if the node represents the end of a word
boolean isEndOfWord
}

Insertion of a word in a trie

  • Every character of the input key/word is an individual trie node. The child of a trie node is an array of pointers (or references) to next-level trie nodes.

  • If the input key is new or an extension of the existing key, we need to construct nonexisting nodes of the key and mark the end of the word for the last node as true.

  • If the input key is a prefix of the pre-existing key in the trie, we reuse it and move ahead in the trie path until we mark the value of isEndOfWord parameter associated with the last node of the word as true. The length of the longest word determines the depth of the trie.

void insert(string word)
{
// point the current node to the root of the trie
current_node = root
// iterate through all the characters of the given word
for(every char in string word)
{
// if there is no child node associated with current character
// create a new child trie node
if(child_node belonging to current char is null)
{
child_node = new TrieNode
}
// move the current node pointer to the child node
current_node = child_node
}
}
Pseudocode for the insertion of a word in a trie

Searching a word or a prefix in a trie

  • Searching is similar to an insert operation. However, we only compare the characters and move down. In addition, the search can terminate upon reaching the end of the query string or due to the lack of the key in the trie.

  • In the first case, if the value of the  isEndOfWord parameter of the last node encountered while searching for the key in the trie is true, it means the key exists in the trie, and the search function returns a true.

  • In the second case, the search terminates without examining all the characters of the key. This is because the trie path exhausts before traversing and comparing all the characters of the key, implying that the key is not present in the trie.

boolean search(string word)
{
// point the current node to the root of the trie
current_node = root
// iterate through all the characters of the given word
for (every char in string s)
{
//the search terminates without examining all the characters of the key.
//This is because the trie path exhausts before traversing and comparing
// all the characters of the key, implying that the key is not present in the trie.
if (child_node belonging to current char is null)
{
return false;
}
// move the current node pointer to the child node
current_node = child_node
}
// if the value of isEndOfWord for the node associated with the last character of the
// word is true, it means the complete word is present in the trie.
// Hence true is returned, else false is returned.
if isEndOfWord is true,
return true
else
return false
}
Pseudocode for searching a word in a trie

Searching for a prefix is similar to searching for a word operation. However, in this case, it's not necessary to reach the last node where isEndOfWord is true.

Press + to interact
boolean isPrefix(string pref)
{
// point the current node to the root of the trie
current_node = root
// iterate through all the characters of the given word
for (every char in String s)
{
//the search terminates without examining all the characters of the key.
//This is because the trie path exhausts before traversing and comparing
// all the characters of the key, implying that the key is not present in the trie.
if (child_node belonging to current char is null)
{
return false
}
// move the current node pointer to the child node
current_node = child_node
}
// if all the letters of the prefix query string were successfully traversed
// without the search getting terminated in between, it means the prefix string
// is present in the trie. Hence true is returned.
return true
}

Problem identification

If the problem description contains any of the following properties, it may be solved using tries and prefix searching:

  • When detecting full or partial matches in the strings.

  • When finding shared prefixes between multiple strings.

  • When finding or counting the number of words containing a prefix.

  • When replacing matching words with another phrase.

  • When the input size is enormous, and storing all strings separately in memory as a list or a hashmap leads to high space complexity.

  • When pairwise comparison of strings leads to higher time complexity.