clone 2 [sir's key-value store]
Like a hash-table but distributed and for much larger scale.
We'll cover the following
- Motivation
- User-facing API
- Datatype
- Requirements from the key-value store
- Assumptions
- Programming model
- ToDo
- What happens under failures
- Lessons Learnt
- Design
- Gossip protocol
- inconsistencies and faults
- (section 6.3) Divergent versions
- HInted hand-off
- anti-entropy protocol for replicate synchronization
- Failure detection
- Adding/removing storage nodes (4.9)
- Things to pounder
- Evaluation?
- Some quizzes
- Sources
- Implementation details
- Data partitioning
- Data replication
- Data versioning and conflict resolution
- Engineering
- uniform load distribution
- client or server driven coordination
- (6.5) Interference
- Interesting facts
- Further exploration
Motivation
Key-Value stores are distributed hash tables (DHT). They are useful in many situations such as storing user sessions in a web application and for building No-SQL databases. In a distributed environment, it has proved very challenging to scale traditional databases with strong consistency and high availability. A key-value store only binds a key to a specific value and does not assume anything about the structure of the key or the value (except the keys should be unique). These simpler semantics help a key-value store to scale at the global level with high availability and a spectrum of options for data consistency. In the context of the CAP theorem, key-value stores can either be consistent or available when there are network partitions. At configuration time, a key-value store can be instantiated to favor of consistency and availability.
Many real-world services only need primary-key access to a data store instead of traditional OLTP (Online Transaction Processing) databases. Examples include bestseller lists, shopping carts, customer preferences, session management, sales rank, and product catalog.
(ToDo: Above picture copied from https://hazelcast.com/glossary/key-value-store/ We need our own illustration)
Key-Value stores use many servers for the storage and retrieval of data. A single-node-based hashtable falls short due to one or more of the following reasons:
- No matter how big a server we could get, data storage and query requirements can not be met by this server.
- Failure of this one mega-server will mean service downtime for everyone.
While designing a key-value store we will exercise many of the distributed systems concepts such as:
- consistent hashing
- logical clocks
- data consistency
- sloppy quorums
- gossip-based distributed failure detection and membership protocol
We will also see how good engineering often goes hand-in-hand with a solid underpinning for a practical system. We will draw on
Rich programming model provided by traditional DBMS might not be required by many applications. Using RDBMS for such applications is often expensive both in terms of dollars and performance.
User-facing API
Key-Value stores, like ordinary hash tables, provide two primary functions:
Get
: given a key, provide the associated value.
When data is replicated, the operation of locating the object replica that is associated with a specific key is hidden from the end-user and is done by the system. If the store was configured with a weaker data consistency model (for example eventual consistency, there might be more than one value returned against a key, as we will see later).
Put
: store the value associated with the key
The system automatically determines where data should be placed. Additionally system often keeps metadata about the stored object as well. Such metadata can include the version of the object.
Create a free account to access the full course.
By signing up, you agree to Educative's Terms of Service and Privacy Policy