Data Structure for Storing Prefixes

Define the optimal data structure for storing prefixes in a high-performance typeahead system. Learn how to use the trie to store search frequencies for ranking suggestions. Explore methods for partitioning and updating the Trie offline to achieve massive scalability and maintain low-latency reads in System Design.

The trie data structure

Use a data structure optimized for prefix lookups. The service must match user input against a large set of strings to return query completions. For example, given the terms “UNITED”, “UNIQUE”, “UNIVERSAL”, and “UNIVERSITY”, typing “UNIV” should return “UNIVERSAL” and “UNIVERSITY” as prefix matches.

To handle high request volumes with minimal latency, we cannot rely on standard database queries. Reading from a database is much slower than reading from RAMRecall from the Back-of-the-Envelope Calculation chapter that reading 1MB of data sequentially from RAM takes 9 microseconds while from persistent storage like SSD takes 200 microseconds.. Therefore, we must store the index in memory. However, the data persists in the database for durability.

A trie (pronounced “try”) is commonly used for this purpose. A trie is a tree structure where each node stores a character of a string. The example words are stored in the trie as follows: