As we've seen before, a trie is a tree-based, ordered data structure that stores associative data structures, primarily strings.

There are many definitions available on the internet for tries, but here we'll try to learn the intuition behind the data structure, its usage, and its practical applications.

Intuition

In this lesson, we'll learn about the intuition behind the idea of tries and how other data structures could be used to solve similar problems. If you know the basics of a trie, you can jump to the next chapter.

Understanding prefixes

A prefix is a substring that occurs at the beginning of a string. A substring is a contiguous sequence of characters within a string. For example, the word "brick" has the following prefixes: "b", "br", "bri", "bric", and "brick".

Problem

Let's try to build a primary search feature. When a user types in a word, the software must show the user all the words which can be constructed using the current word as a prefix. The back-end server maintains a dictionary of valid words. The prefix string is sent to the back-end server, and the server returns the list of all the words with the current word as a prefix.

Note: If a user types in "bri",the server returns a list of strings containing "bri" as suggestions, such as "brick", "bright", "brisket", "bride", and more.

To analyze the performance of our solutions, let's define a few variables.

  • Length of the word typed by the user for searching = WW.

  • Total number of words present in the dictionary = NN.

In the section below, we've explained the possible ways to build this feature.

Possible solutions

Below we describe some possible solutions using well-known data structures and algorithms.

List-based solution

The simplest solution is to store all the words in a list. Then, whenever the server receives a prefix string from the user, it searches the entire list and returns all the words with the matching prefix. But this implementation is computation heavy and inefficient.

Press + to interact
for word in words_list
{
// isPrefix(p, s) is function that
// returns true ID p is a prefix of s, else false
if (isPrefix(pre, word))
suggestions.add(word)
}
return suggestions

Time complexity

We iterate through a list of length NN and perform prefix matching for each dictionary word. Prefix matching takes time equivalent to the length of the prefix string, which is WW. Therefore, the total time complexity is O(NW)O(NW).

Space complexity

Since we're storing NN words of size WWin the memory, the space complexity becomes O(NW)O(NW).


Sorted list-based solution

Let's now consider an example where the prefix entered by the user is "bri", and the list of words stored in the server's memory is:

["apple", "ant", "bin", "back", "ball", "big", "brick", "bribe", "broke", "bride", "brag", "bridge", "bright", "brisket", "yellow", "wood"]

In this case, it's certain that the words not starting with the letter "b" are invalid suggestions. This means sorting the list of words in alphabetical order can ease the searching procedure.

After sorting the list, we can search for all the words starting with "b" using binary search (applying lower- and upper-bound functions). Now our search space is reduced to the list given below:

["bin", "back", "ball", "big", "brick", "bribe", "broke", "bride", "brag", "bridge", "bright", "brisket"]

To search for words with the prefix "br", we can apply binary search again to get the upper and lower bounds of all words with the second character being "r". The final output would be as given below: 

["brag", "bribe", "brick", "bride", "bridge", "bright", "brisket", "broke"]

And finally, with another search of the third character as "i", the output would look like the below list:

["bribe", "brick", "bride", "bridge", "bright", "brisket"]

The pseudocode below summarizes the steps involved in this approach:

Press + to interact
//sort the list
sorted_list = sort(words_list)
for (i in range 0 to len(prefix)){
prefix_to_search = substr(0, i)
//binary search to find start and end indices
start_index = lower_bound(sorted_list, prefix_to_search)
end_index = upper_bound(sorted_list, prefix_to_search)
// keep reducing the search space
sorted_list = sorted_list[start_index : end_index]
}
return sorted_list

Time complexity

The time complexity for sorting the list of strings using a standard sorting algorithm like merge sort is O(NlogN)O(NlogN). We iterate through the query string characterwise and perform a binary search for each character addition. Since there are WW letters in the query string, the time complexity for searching is O(WlogN)O(WlogN). Assuming N>>WN>>W, the average case time complexity becomes O(NlogN)O(NlogN).

Space complexity

A sorting algorithm like merge sort would incur a space complexity of O(N)O(N). Also, since we're storing NN words of size WW in the memory, it would add a space complexity of O(NW)O(NW). Hence, the total space complexity becomes O(N+NW)O(NW).O(N+NW) \approx O(NW).

Hashmap containing all prefixes

Another possible solution to the problem is by generating all the possible prefixes of all the words and storing them in a hashmap. Let’s say we have the words "brick", "bright", and "bride". We create a hashmap of all the prefixes generated from these words. The string P will be the key of the hashmap, and the list of words containing P as a prefix will be the value. The hashmap would look something like this:

{
"b": ["brick", "bright", "bride"]
"br": ["brick", "bright", "bride"]
"bri": ["brick", "bright", "bride"]
"bric": ["brick"]
"brick": ["brick"]
"brid": ["bride"]
"bride": ["bride"]
"brig": ["bright"]
"brigh": ["bright"]
"bright": ["bright"]
}

Now, whenever a user types any string, we'll search for that string in our hashmap and return all the suitable matches for the prefix.

Time complexity

The average time complexity of the insertion of key-value pairs in a hashmap is O(1)O(1). The time required for generating the hash of the string is proportional to its length. So, the time complexity for searching a key in the hashmap is of the order of size of the query string, which is O(W)O(W).

Space complexity

For every possible prefix, multiple words are present as a value in the form of a list. Although this seems like a fast solution due to the involvement of hashmap, it's a nightmare in terms of memory utilization. New key-value pairs are created for all possible prefixes on adding new words, and already existing value lists must also be checked and modified. This adds a lot of memory requirements and time complexity.


Tree-based solution

A somewhat different approach is to store the words in a tree or a linked-list fashion, where a node represents a single character of the string, and the next node is the next character in the string. They share the same root and hence form a tree structure. For example, for the word list ["brick", "bribe", "bride", "bridge", "bright", "brisket"], the tree structure would look something like the below:

%0 node_1 b node_2 r node_1->node_2 node_1661272196361 i node_2->node_1661272196361 node_1661272276345 d node_1661272196361->node_1661272276345 node_1661272239032 b node_1661272196361->node_1661272239032 node_1661272206648 c node_1661272196361->node_1661272206648 node_1661272320100 g node_1661272196361->node_1661272320100 node_1661272345286 s node_1661272196361->node_1661272345286 node_1661272298567 g node_1661272276345->node_1661272298567 node_1661272284162 e node_1661272276345->node_1661272284162 node_1661272249643 e node_1661272239032->node_1661272249643 node_1661272218447 k node_1661272206648->node_1661272218447 node_1661272328418 h node_1661272320100->node_1661272328418 node_1661272368124 k node_1661272345286->node_1661272368124 node_1661272305266 e node_1661272298567->node_1661272305266 node_1661272334494 t node_1661272328418->node_1661272334494 node_1661272372818 i node_1661272368124->node_1661272372818 node_1661272382028 t node_1661272372818->node_1661272382028
A trie for the list of words ["brick", "bribe", "bride", "bridge", "bright", "brisket"]

When we get a prefix to search, we traverse down the tree, following the string character by character. Once all the characters in the prefix string are exhausted, all nodes below the current node are returned as an answer.

This approach is very efficient if you consider that search suggestions are continuous. Suppose a user types a search query "bri" and the server returns the list of words containing "bri" as a prefix. If the user keeps typing further to search for the string "bric" then we don't need to start again from the tree's root. Instead, we can keep traversing the tree from the last traversed point. 

Other benefits of this approach include the capability to maintain details like the most searched prefix path and other additional information. These parameters are generally used to optimize the search suggestions. 

Time complexity

The time complexity for the creation of a trie is O(NW)O(NW)since all the NN words of size WW are inserted as nodes in the tree. Searching for a prefix is O(W)O(W)since we need to traverse nodes only until the size of the prefix, which can be at max WW.

Space complexity

We create new nodes for every character in every word. In the worst case, N×WN \times W new nodes are created. Hence, the space complexity becomes O(NW)O(NW).


Revisiting the definition of a trie

Let's revisit the definition of a trie and check if it makes more sense to us than before. The idea of the trie is very similar to the tree-based solution above. If we put all the string characters in a tree, with each node representing one character, the tree is called a trie.

There are multiple applications of tries apart from prefix searching. Also, tries can be implemented differently depending on the use case.