Design a Key-Value Store
Learn to design a key-value store.
Introduction to key-value stores
Key-value stores are
Requirements
Let’s list the requirements of designing a key-value store to overcome the problems of traditional databases.
Functional requirements
The functional requirements are as follows:
Configurable service: Some applications might have a tendency to trade strong consistency for higher availability. We need to provide a configurable service so that different applications could use a range of consistency models. We need tight control over the trade-offs between availability, consistency, cost-effectiveness, and performance.
Ability to always write: The applications should always have the ability to write into the key-value storage.
Hardware heterogeneity: The system shouldn’t have distinguished nodes. Each node should be functionally able to do any task.
Non-functional requirements
The non-functional requirements are as follows:
Scalable: Key-value stores should run on tens of thousands of servers distributed across the globe. Incremental scalability is highly desirable, and the system should be able to handle a large number of users.
Available: We need to provide continuous service, so availability is very important. This property is configurable.
Fault tolerance: The key-value store should operate uninterrupted despite failures in servers or their components.
Now that you understand the functional and non-functional requirements of a key-value store, what do you think are the key differences between key-value stores and traditional databases?
In what scenarios are key-value stores particularly advantageous?
API design
Key-value stores, like ordinary hash tables, provide two primary functions, which are get
and put
.
The
get
function: The API call to get a value isget(key)
. Thekey
is to retrieve a specific value associated with the key.The
put
function: The API call to insert a value isput(key, value)
.
Note: This lesson is based on
, which is an influential work in the domain of key-value stores. Dynamo Amazon’s Highly Available Key-value Store (https://assets.amazon.science/ac/1d/eb50c4064c538c8ac440ce6a1d91/dynamo-amazons-highly-available-key-value-store.pdf)
Let’s start with adding scalability in the Key-values Store, which is one of the core design requirements, in the following section:
Scalability
We store key-value data in storage nodes. With a change in demand, we might need to add or remove storage nodes. It means we need to partition data over the nodes in the system to distribute the load across all nodes. To achieve scalability, we'll use consistent hashing due to its ability to provide a balanced load distribution.
Consistent hashing is an effective way to manage the load over the set of nodes. In consistent hashing, we consider having a conceptual ring of hashes from
Point to Ponder
What is the drawback of using consistent hashing?
Use virtual nodes
We’ll use virtual nodes to ensure a more evenly distributed load across the nodes. Instead of applying a single hash function, we’ll apply multiple hash functions onto the same key.
Let’s take an example. Suppose we have three hash functions. For each node, we calculate three hashes and place them into the ring. For the request, we use only one hash function. Wherever the request lands onto the ring, it's processed by the next node found while moving in the clockwise direction. Each server has three positions, so the load of requests is more uniform. Moreover, if a node has more hardware capacity than the others, we can add more virtual nodes by using additional hash functions. This way, it'll have more positions in the ring and serve more requests.
We’ve made the proposed design of key-value storage scalable. The next task is to make our system highly available.
Data replication
We have various methods to replicate the storage. It can be either a
The primary-secondary model becomes a single point of failure if the primary node goes down. Therefore, to avoid a single point of failure, we’ll use a peer-to-peer relationship for replication. We’ll replicate the data on multiple hosts for durability and high availability. Each data item will be replicated at