System Design Deep Dive: Real-World Distributed Systems/

...

High-level Design of SILT

Learn how the multi-store approach improves the chances of designing a better key-value store.

We'll cover the following...

Design approach
Single-store approach
- HashCache
- Log-based key-value stores
The problem with a single-store approach
The big idea: A multi-store approach
Our model
- API
- Design overview

Design approach

There are a number of different approaches to making key-value stores, each of which can result in a different system.

For example, some are solely in the RAM and not backed by persistent storage. While these can provide top-notch performance, they are costly and primarily support fast reads only. The key-value cache will need to be rebuilt on any server failure from the source store.

On the other hand, for some key-value stores, everything is stored in a persistent disk; this is economical but low in performance due to inherently slow IO operations. However, there have been suggestions to improve performance in such stores. Ideally, a key-value store should use memory and storage to provide acceptable dollar cost and performance. However, flexibility in using these two major resources depends on our approach to designing a key-value store.

Our experience with key-value store designs shows that it is challenging to get the best of both worlds (economic and fast reads and writes) using a single key-value store. In our design, we will use a neat idea where we collectively use many kinds of stores, each optimized for a specific goal, and providing a good trade-off point. This approach is called a multi-store approach. In the following section, we will elaborate on why single-store approaches limit us from achieving our goals and then show how multi-store design copes with these limits.

Single-store approach

The single-store approach uses only one key-value store with features to improve performance. Generally, a single-store key-value pair has three main components:

A filterThis filters requests and only allows those requests to pass through for which keys are stored in the key-value store, stored in memory, reduces disk reads by checking if a key is stored or not. Read requests that pass the filter result in disk reads.
An in-memory index to provide the address of a key in storage.
A storage data layout that contains all of our key-value entries.

A single-store approach makes it hard to simultaneously achieve all our design goals (high-performance with efficient use of resources). Let's look at some single-store key-value stores.

HashCache

HashCache is a single-store key-value store. We should consider the following two features of HashCache:

It has no in-memory index. Instead, it has a hash table in storage. This reduces the use of expensive memory.
HashCache keeps track of its cached objects via a special in-memory table. It shifts the workload of confirming cache misses to memory and is optimized to use low memory.

Note: The ...

Prologue

File Systems

Google File System (GFS)

Google Colossus File System

Facebook's Tectonic File System

Databases

Google Bigtable

Google Megastore

Google Spanner

Key-value Stores

Many-core Key-value Store

Scaling Memcache

SILT

Amazon DynamoDB

Concurrency Management

Two-phase Locking (2PL)

Google Chubby Locking Service

ZooKeeper

Big Data Processing: Batch to Stream Processing

MapReduce

Spark

Kafka

Consensus

Understanding Consensus: Two Generals, FLP, & Byzantine Generals

Two-phase Commit

State Machine Replication

Paxos

Raft

Epilogue

High-level Design of SILT

Design approach

Single-store approach

HashCache