Data Structure for Storing Prefixes

Learn about the efficient data structure to store the suggestions.

Trie data structure

Before moving into the discussion of the detailed design of the typeahead suggestion system, let’s choose an efficient data structure to store the prefixes (group of characters a user types).

The issue we are attempting to tackle is that we have many strings that we need to store in a way that allows users to search for them using any prefix. Our service will suggest the next words that match the provided prefix.

Suppose our database contains the phrases UNITED, UNIQUE, UNIVERSAL, and UNIVERSITY, for example. Our system should suggest UNIVERSAL and UNIVERSITY when the user types UNIV.

There should be a method that can efficiently store our data so that it can be searched rapidly because we have to handle a lot of requests with minimal latency. We can not rely on a database for this; as providing suggestions from the database takes longer as compared to reading from RAMRecall from the Back-of-the-Envelope Calculation chapter that reading 1MB of data sequentially from RAM takes 9 microseconds while from persistent storage like SSD takes 200 microseconds.. Therefore, we need to store our index in memory in an efficient data structure. However, for durability and availability, this data is stored in the database.

The Trie (pronounced “try”) is one of the most suited data structures for our needs. A trie is a tree-like data structure for storing phrases, with each tree node storing a character in the phrase in order. If we needed to store UNITED, UNIQUE, UNIVERSAL, and UNIVERSITY in the trie, for example, it would look like this:

Create a free account to access the full course.

By signing up, you agree to Educative's Terms of Service and Privacy Policy