Introduction to key-value stores

Key-value stores are distributed hash tables (DHTs).A distributed hash table (DHT) is a decentralized storage system that provides lookup and storage schemes similar to a hash table, storing key-value pairs. Source: https://www.educative.io/edpresso/what-is-a-distributed-hash-table A key is generated by the hash function and should be unique. In a key-value store, a key binds to a specific value and doesn't assume anything about the structure of the value. A value can be a blob, image, server name, or anything the user wants to store against a unique key.

Requirements

Let’s list the requirements of designing a key-value store to overcome the problems of traditional databases.

Functional requirements

The functional requirements are as follows:

Configurable service: Some applications might have a tendency to trade strong consistency for higher availability. We need to provide a configurable service so that different applications could use a range of consistency models. We need tight control over the trade-offs between availability, consistency, cost-effectiveness, and performance.
Ability to always write: The applications should always have the ability to write into the key-value storage. If the user wants strong consistency, this requirement might not always be fulfilled due to the implications of the CAP theorem.
Hardware heterogeneity: The system shouldn’t have distinguished nodes. Each node should be functionally able to do any task. Though servers can be heterogeneous, newer hardware might be more capable than older ones.

Non-functional requirements

The non-functional requirements are as follows:

Scalable: Key-value stores should run on tens of thousands of servers distributed across the globe. Incremental scalability is highly desirable. We should add or remove the servers as needed with minimal to no disruption to the service availability. Moreover, our system should be able to handle an enormous number of users of the key-value store.
Available: We need to provide continuous service, so availability is very important. This property is configurable. So, if the user wants strong consistency, we’ll have less availability and vice versa.
Fault tolerance: The key-value store should operate uninterrupted despite failures in servers or their components.

API design

Key-value stores, like ordinary hash tables, provide two primary functions, which are get and put.

get function: The API call to get a value is get(key).
put function: The API call to get a value is put(key, context, value); where context holds encoded metadata about the object.

Note: This lesson is based on DynamoAmazon’s Highly Available Key-value Store (https://assets.amazon.science/ac/1d/eb50c4064c538c8ac440ce6a1d91/dynamo-amazons-highly-available-key-value-store.pdf), which is an influential work in the domain of key-value stores.

Add scalability

Let’s start with one of the core design requirements: scalability. We will use consistent hashing to achieve it.

Consistent hashing is an effective way to manage the load over the set of nodes. In consistent hashing, we consider having a conceptual ring of hashes from $0$ to $n-1$ , where $n$ is the number of available hash values. We use each node's ID, calculate its hash, and map it to the ring. We apply the same process to requests. Each request is completed by the next node that it finds by moving in the clockwise direction in the ring.

Use virtual nodes

We’ll use virtual nodes to ensure a more evenly distributed load across the nodes. Instead of applying a single hash function, we’ll apply multiple hash functions onto the same key.

Let’s take an example. Suppose we have three hash functions. For each node, we calculate three hashes and place them into the ring. For the request, we use only one hash function. Wherever the request lands onto the ring, it's processed by the next node found while moving in the clockwise direction. Each server has three positions, so the load of requests is more uniform. Moreover, if a node has more hardware capacity than others, we can add more virtual nodes by using additional hash functions. This way, it'll have more positions in the ring and serve more requests.

Press + to interact

We’ve made the proposed design of key-value storage scalable. The next task is to make our system highly available.

Data replication

We have various methods to replicate the storage. It can be either a primary-secondary relationship or a peer-to-peer relationship.

We’ll use a peer-to-peer relationship for replication. We’ll replicate the data on multiple hosts for durability and high availability. Each data item will be replicated at $n$ hosts, where $n$ is a parameter configured per instance of the key-value store. For example, if we choose $n$ to be $5$ , it means we want our data to be replicated to five nodes.

Each node will replicate its data to the other nodes. We’ll call a node coordinator that handles read or write operations. It’s directly responsible for the keys. A coordinator node ...

Access this course and 1400+ top-rated courses and projects.

__default

Design Key-value Store