Introduction to Prefix Searching Using Tries

What is a prefix?

As we saw before, a prefix string is a substring of a string or word present at the beginning.
For example, in the word holiday, the valid prefixes are h, ho, hol, holi, holid, holida, and holiday.

What is prefix searching?

Searching for the presence of a prefix in a string is called prefix searching. Problems requiring a prefix search are very common.

For a general use case, one can traverse the string to check if the prefix is present. For example, to check if holi is a valid prefix for the word holiday, one can traverse the word holiday and match each character one by one to validate if the prefix is present.

But this can be problematic if there's a need to search for prefixes in multiple words. One has to go through all the strings one by one. In cases like these, tries can be very helpful. With a trie, we can search for prefixes of multiple words in a single traversal.

Trie traversal

Trie is an N-ary tree and is generally traversed downward from the root node to the leaf node. The tree is traversed for each character in the provided word for prefix-related searches.

Instead of comparing each word separately for the prefix, we create a trie of all the words. For example, in the visualization below, we're traversing for the prefix bak in the trie. All the words with the matching prefixes are identified in a single traversal in the case of a trie instead of performing multiple word traversals and comparisons, as in the brute-force approach.

Press + to interact

Insertion of a word in a trie

Every character of the input key/word is an individual trie node. The child of a trie node is an array of pointers (or references) to next-level trie nodes.
If the input key is new or an extension of the existing key, we need to construct nonexisting nodes of the key and mark the end of the word for the last node as true.
If the input key is a prefix of the pre-existing key in the trie, we reuse it and move ahead in the trie path until we mark the value of isEndOfWord parameter associated with the last node of the word as true. The length of the longest word determines the depth of the trie.


void insert(string word)
{
    // point the current node to the root of the trie
    current_node = root 
    // iterate through all the characters of the given word
    for(every char in string word)
    {
        // if there is no child node associated with current character
        // create a new child trie node
        if(child_node belonging to current char is null)
        {
            child_node = new TrieNode
        }
        // move the current node pointer to the child node
        current_node = child_node
    }
}

Pseudocode for the insertion of a word in a trie

Searching a word or a prefix in a trie

Searching is similar to an insert operation. However, we only compare the characters and move down. In addition, the search can terminate upon reaching the end of the query string or due to the lack of the key in the trie.
In the first case, if the value of the isEndOfWord parameter of the last node encountered while searching for the key in the trie is true, it means the key exists in the trie, and the search function returns a true.
In the second case, the search terminates without examining all the characters of the key. This is because the trie path exhausts before traversing and comparing all the characters of the key, implying that the key is not present in the trie.


boolean search(string word)
{
   	// point the current node to the root of the trie
    current_node = root
   	// iterate through all the characters of the given word
    for (every char in string s)
    {
       	//the search terminates without examining all the characters of the key. 
       	//This is because the trie path exhausts before traversing and comparing 
       	// all the characters of the key, implying that the key is not present in the trie.
        if (child_node belonging to current char is null)
        {
            return false;
        }
       	// move the current node pointer to the child node
        current_node = child_node
    }
   	// if the value of isEndOfWord for the node associated with the last character of the
   	// word is true, it means the complete word is present in the trie. 
   	// Hence true is returned, else false is returned. 
    if isEndOfWord is true,
        return true
    else
        return false
}

Pseudocode for searching a word in a trie

Press + to interact


boolean isPrefix(string pref)
{
   	// point the current node to the root of the trie
    current_node = root
   	// iterate through all the characters of the given word
    for (every char in String s)
    {
       	//the search terminates without examining all the characters of the key. 
       	//This is because the trie path exhausts before traversing and comparing 
       	// all the characters of the key, implying that the key is not present in the trie.
        if (child_node belonging to current char is null)
        {
            return false
        }
       	// move the current node pointer to the child node
        current_node = child_node
    }
   	// if all the letters of the prefix query string were successfully traversed
   	// without the search getting terminated in between, it means the prefix string 
   	// is present in the trie. Hence true is returned. 
    return true
}

Problem identification

If the problem description contains any of the following properties, it may be solved using tries and prefix searching:

When detecting full or partial matches in the strings.
When finding shared prefixes between multiple strings.
When finding or counting the number of words containing a prefix.
When replacing matching words with another phrase.
When the input size is enormous, and storing all strings separately in memory as a list or a hashmap leads to high space complexity.
When pairwise comparison of strings leads to higher time complexity.

Introduction to Tries

Prefix Search

Suffix Search

Bitwise Tries

Pattern Matching

File Systems

Trie Traversal

Search Engine

Miscellaneous

Conclusion

What is a prefix?

What is prefix searching?

Trie traversal

End of word

Pseudocode

Trie node

Insertion of a word in a trie

Searching a word or a prefix in a trie

Problem identification