Motivation

Key-Value stores are distributed hash tables (DHT). They are useful in many situations such as storing user sessions in a web application and for building No-SQL databases. In a distributed environment, it has proved very challenging to scale traditional databases with strong consistency and high availability. A key-value store only binds a key to a specific value and does not assume anything about the structure of the key or the value (except the keys should be unique). These simpler semantics help a key-value store to scale at the global level with high availability and a spectrum of options for data consistency. In the context of the CAP theorem, key-value stores can either be consistent or available when there are network partitions. At configuration time, a key-value store can be instantiated to favor of consistency and availability.

Many real-world services only need primary-key access to a data store instead of traditional OLTP (Online Transaction Processing) databases. Examples include bestseller lists, shopping carts, customer preferences, session management, sales rank, and product catalog.

(ToDo: Above picture copied from https://hazelcast.com/glossary/key-value-store/ We need our own illustration)

Key-Value stores use many servers for the storage and retrieval of data. A single-node-based hashtable falls short due to one or more of the following reasons:

No matter how big a server we could get, data storage and query requirements can not be met by this server.
Failure of this one mega-server will mean service downtime for everyone.

While designing a key-value store we will exercise many of the distributed systems concepts such as:

consistent hashing
logical clocks
data consistency
sloppy quorums
gossip-based distributed failure detection and membership protocol

We will also see how good engineering often goes hand-in-hand with a solid underpinning for a practical system. We will draw on Amazon’s Dynamo Key-Value storeDynamo: amazon’s highly available key-value store. In Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles (SOSP '07). Association for Computing Machinery, New York, NY, USA, 205–220. for this purpose due to its influence on subsequent key-value stores (like Casandra and DynamoDB).

Rich programming model provided by traditional DBMS might not be required by many applications. Using RDBMS for such applications is often expensive both in terms of dollars and performance.

User-facing API

Key-Value stores, like ordinary hash tables, provide two primary functions:

Get: given a key, provide the associated value.

When data is replicated, the operation of locating the object replica that is associated with a specific key is hidden from the end-user and is done by the system. If the store was configured with a weaker data consistency model (for example eventual consistency, there might be more than one value returned against a key, as we will see later).

Put: store the value associated with the key

The system automatically determines where data should be placed. Additionally system often keeps metadata about the stored object as well. Such metadata can include the version of the object.

Datatype

In a key-Value store, the key is often some kind of primary key, while value can be any arbitrary binary data. One can even use a fast compressor (like Google’s Snappy) before storing the value to reduce data over the network and in storage.

Dynamo uses MD5 hashes on the key to generate a 128-bit identifier. These identifiers help the system to determine which server node will be responsible for this specific key.

Three-way replication of data is a popular choice. For some critical data, it might be replicated five times at different places (in different failure domains such as physically disparate data centers.)

Requirements from the key-value store

We have many requirements from our system.

Key-Value store will run on tens of thousands of servers distributed across the globe. We should be able to add or remove the servers as needed with minimal to no disruption to the service availability.
One or more servers or components such as disks, memories, and networks can fail. Service should be able to tolerate many such faults.
Key-Value stores are also used for applications that need always-on experience (such as shopping carts on websites like amazon.com). Availability is of paramount importance.
Some applications might be willing

...

Create a free account to access the full course.

By signing up, you agree to Educative's Terms of Service and Privacy Policy