Introduction to Prefix Searching Using Tries
Get an introduction to prefix searching using tries.
What is a prefix?
As we saw before, a prefix string is a substring of a string or word present at the beginning.
For example, in the word holiday,
the valid prefixes are h
, ho
, hol
, holi
, holid
, holida
, and holiday
.
What is prefix searching?
Searching for the presence of a prefix in a string is called prefix searching. Problems requiring a prefix search are very common.
For a general use case, one can traverse the string to check if the prefix is present. For example, to check if holi
is a valid prefix for the word holiday
, one can traverse the word holiday
and match each character one by one to validate if the prefix is present.
But this can be problematic if there's a need to search for prefixes in multiple words. One has to go through all the strings one by one. In cases like these, tries can be very helpful. With a trie, we can search for prefixes of multiple words in a single traversal.
Trie traversal
Trie is an N-ary tree and is generally traversed downward from the root node to the leaf node. The tree is traversed for each character in the provided word for prefix-related searches.
Instead of comparing each word separately for the prefix, we create a trie of all the words. For example, in the visualization below, we're traversing for the prefix bak
in the trie. All the words with the matching prefixes are identified in a single traversal in the case of a trie instead of performing multiple word traversals and comparisons, as in the brute-force approach.
End of word
While searching for a word in a trie, we need to determine if the word is present as a complete word or as a prefix of another word or words. We can make this distinction by adding a new boolean parameter, isEndOfWord
, in each trie node. This flag's value is updated as true
for the trie nodes associated with every word's last character.
Pseudocode
Trie node
A trie node would contain the following parameters:
An array of pointers to the child nodes.
A boolean parameter denoting the end of the word.
// Trie node classclass TrieNode{// An array of pointer to the child nodes.// The maximum number of children for a trie node// can be equal to the language's alphabet size.TrieNode children[ALPHABET_SIZE]// isEndOfWord is true if the node represents the end of a wordboolean isEndOfWord}
Insertion of a word in a trie
Every character of the input key/word is an individual trie node. The child of a trie node is an array of pointers (or references) to next-level trie nodes.
If the input key is new or an extension of the existing key, we need to construct nonexisting nodes of the key and mark the end of the word for the last node as true.
If the input key is a prefix of the pre-existing key in the trie, we reuse it and move ahead in the trie path until we mark the value of
isEndOfWord
parameter associated with the last node of the word as true. The length of the longest word determines the depth of the trie.
void insert(string word){// point the current node to the root of the triecurrent_node = root// iterate through all the characters of the given wordfor(every char in string word){// if there is no child node associated with current character// create a new child trie nodeif(child_node belonging to current char is null){child_node = new TrieNode}// move the current node pointer to the child nodecurrent_node = child_node}}
Searching a word or a prefix in a trie
Searching is similar to an insert operation. However, we only compare the characters and move down. In addition, the search can terminate upon reaching the end of the query string or due to the lack of the key in the trie.
In the first case, if the value of the
isEndOfWord
parameter of the last node encountered while searching for the key in the trie is true, it means the key exists in the trie, and the search function returns atrue
.In the second case, the search terminates without examining all the characters of the key. This is because the trie path exhausts before traversing and comparing all the characters of the key, implying that the key is not present in the trie.
boolean search(string word){// point the current node to the root of the triecurrent_node = root// iterate through all the characters of the given wordfor (every char in string s){//the search terminates without examining all the characters of the key.//This is because the trie path exhausts before traversing and comparing// all the characters of the key, implying that the key is not present in the trie.if (child_node belonging to current char is null){return false;}// move the current node pointer to the child nodecurrent_node = child_node}// if the value of isEndOfWord for the node associated with the last character of the// word is true, it means the complete word is present in the trie.// Hence true is returned, else false is returned.if isEndOfWord is true,return trueelsereturn false}
Searching for a prefix is similar to searching for a word operation. However, in this case, it's not necessary to reach the last node where isEndOfWord
is true.
boolean isPrefix(string pref){// point the current node to the root of the triecurrent_node = root// iterate through all the characters of the given wordfor (every char in String s){//the search terminates without examining all the characters of the key.//This is because the trie path exhausts before traversing and comparing// all the characters of the key, implying that the key is not present in the trie.if (child_node belonging to current char is null){return false}// move the current node pointer to the child nodecurrent_node = child_node}// if all the letters of the prefix query string were successfully traversed// without the search getting terminated in between, it means the prefix string// is present in the trie. Hence true is returned.return true}
Problem identification
If the problem description contains any of the following properties, it may be solved using tries and prefix searching:
When detecting full or partial matches in the strings.
When finding shared prefixes between multiple strings.
When finding or counting the number of words containing a prefix.
When replacing matching words with another phrase.
When the input size is enormous, and storing all strings separately in memory as a list or a hashmap leads to high space complexity.
When pairwise comparison of strings leads to higher time complexity.