A Memory-efficient Store for SILT: Part I

Learn how to design a memory-efficient key-value store.

Preamble

Memory efficiency is one of the core design goals for our key-value store. In previous lessons, we’ve enabled clients to write keys quickly (using our write-friendly store) by keeping the per-key memory consumption of our key-value store low. Our previous efforts were on stores we used to achieve other goals—fast sequential writing in the write-friendly store and accumulating intermediary stores for bulk merge to a memory-efficient store. In this lesson, we will make our most significant contribution to keeping memory-consumption low in the last-level store. Our goal will be to store the largest proportion of entries (80%) in our most memory-efficient store to ensure our design uses, on average, low memory per key.

We need our last-level store to be a memory-efficient store. Here’s a list of requirements for our last-level store:

  1. Read-friendliness: This is essential because the last-level store will hold a large proportion of keys in our design. Chances are that most of the lookups will happen in this store.

    1. O(log n)\pmb{O(log\space n)}time lookups: Our representation will allow us to achieve logarithmic time complexity in the average case.

  2. Low memory consumption per key: This will help us store most entries in our store and will keep our design's overall per-key memory consumption low.

    1. Compact representation: Our physical representation of keys in memory will be crucial in keeping per-key memory consumption low.

  3. Easy creation: Over time, new data (from the intermediary store) will arrive in last-level store for incorporation. Such data incorporation should be efficient because this operation will be fairly frequent. We will soon see that having the intermediary store sorted has already helped us towards efficient incorporation.

    1. Immutability: We do not require this store to change in real time—we have the write-friendly store for this—hence, we will give up the need to do so to make this store's creation easier.

Immutability and sorting data in storage

Our memory-efficient store will sort keys by key order in storage. For example, for 3-bit keys, 000 will be stored first, then 001, 010, and so on. This will help us achieve the following:

  • Efficient bulk merging of hash-sorted intermediary stores into the key-sorted memory-efficient store

  • Compact representation of keys in memory

A key will only be stored if it exists in the store. In the example above, if key 001 does not exist in the store, then 000 will be followed by 010. We do not need to keep space for key 001 since this store will not change in real time. Doing so, we are able to achieve the following:

  • Our in-memory index for this store will have to keep track of fewer keys, resulting in lower per-key memory consumption.

  • It allows us to keep our memory-efficient store immutable. Immutability is more of a correlation than a causality–one cannot exist without the other in our situation.

Keeping our store immutable might initially seem like a disadvantage, however, it's not. We'll soon see that we can configure our compact index only when the entire keys that we have to store are known. To keep per-key memory consumption low, we must ensure that our in-memory index only tracks necessary keys—stored keys.

If we were to update this store in real time, we’ve got two options:

  1. We’ll reserve storage for all possible keys to avoid recreating our compact index. This results in higher per-key memory consumption since although our in-memory index is compact, however, has to keep track of more keys than it has to.

  2. We can only store keys that are present in the store, and reconfigure the in-memory index every time a new key is added. Even if recreating the index was efficient, rearranging entries in storage every time a new key is added would have resulted in very high write amplification.

Since we only store necessary keys and do not require changing the store, we will keep our store immutable. We will only configure our compact in-memory index once, at the time of the creation of this store. Therefore, we can store keys compactly on storage. The multi-store approach has given us the freedom only to keep one store changing in real time while we can lazily update our other stores. So we will keep the last-level store immutable and only recreate it when there is a significant amount of keys to be updated (represented by the number of intermediary stores accumulated), thereby keeping write amplification low.

Indexing using tries

Our memory-efficient store will store most of our entries. It is essential to design it to use both storage and memory efficiently. We will index values in our memory-efficient store using tries.

trie, also called a prefix tree, stores keys such that a path from the root to a leaf node is unique. An internal node represents the longest common prefix—represented by the path from the root to the internal node—that all its descendant nodes have.

Level up your interview prep. Join Educative to access 70+ hands-on prep courses.