System Design Deep Dive: Real-World Distributed Systems/

...

Adapting to Traffic Patterns in DynamoDB

Learn how to improve throughput allocation in a partitioned database.

We'll cover the following...

Key considerations
Bursting
Adaptive capacity
Global admission control
Splitting for consumption
On-demand provisioning
Quiz
What’s next?

We will allow users to set a provisioned (allocated) throughput for their tables. The initial partitioning of a table divides a table's provisioned throughput equally across all partitions, especially if the range of keys contained in each partition is the same. However, applications might access some keys more frequently. This results in underutilizing dedicated throughput for partitions accessed that are less frequently, and overloading and downtime of partitions that are accessed more frequently. Our partitioning must adapt to customer traffic patterns to solve the problem above.

Note: We can think of throughput as the ability of our service to complete fixed-size requests (read or write) per second.

Key considerations

Let's understand the problem in detail. We will start by modeling our problem.

We define read capacity unit (RCU) as the ability of the system to complete one read request of an item of arbitrary size, say x KB. We define write capacity unit (WCU) as the ability of the system to complete one write request of an item of the same arbitrary size. We will use RCU and WCU when talking about throughput. For example, if a table has provisioned throughput of 10,000 RCUs and 5,000 WCUs, then for items of size x KB, at maximum throughput, it cannot read more than 10,000 items and cannot write more than 5,000 items per second.

Continuing this example, let's consider dividing our table into ten partitions with equal key ranges (the disjoint ranges of keys that identify items hosted by a partition). Our design works such that it assumes that all keys have an equal chance of being accessed. As a result, it divides throughput equally among all partitions. So after dividing a table into ten partitions, a single partition will have a throughput of 1,000 RCUs and 500 WCUs. However, what happens when we wish to add or remove partitions from the table or if the table's provisioned throughput changes? One solution is to continue with our assumption that keys are accessed uniformly. Therefore, we will allocate all partitions' throughput based on the table's provisioned throughput.

When adding or removing partitions, we will distribute a table's provisioned throughput equally among the new number of partitions. So if we added ten more partitions in the table from our example, the total number of partitions would be 20. Each partition will have a throughput of 500 RCUs and 250 WCUs. Moreover, if we removed five partitions from the original number of partitions (ten), the resulting five would have a throughput of 2000 RCUs and 1000 WCUs.
If the table has provisioned throughput changes, the new throughput would be equally divided ...

Prologue

File Systems

Google File System (GFS)

Google Colossus File System

Facebook's Tectonic File System

Databases

Google Bigtable

Google Megastore

Google Spanner

Key-value Stores

Many-core Key-value Store

Scaling Memcache

SILT

Amazon DynamoDB

Concurrency Management

Two-phase Locking (2PL)

Google Chubby Locking Service

ZooKeeper

Big Data Processing: Batch to Stream Processing

MapReduce

Spark

Kafka

Consensus

Understanding Consensus: Two Generals, FLP, & Byzantine Generals

Two-phase Commit

State Machine Replication

Paxos

Raft

Epilogue

Adapting to Traffic Patterns in DynamoDB

Key considerations