Design Considerations of a Blob Store
Discover the essential System Design considerations for building a reliable blob store. Implement data chunking, metadata management, and user-account-based strategic partitioning. Learn to use multi-level replication and indexing to ensure high availability and fast querying performance.
Introduction
The previous lesson outlined the major components of the blob store. This lesson examines implementation challenges, including storing large blobs, managing replicas, and optimizing retrieval latency. The following table summarizes the lesson objectives.
Summary of the Lesson
Section | Purpose |
Blob metadata | This is the metadata that’s maintained to ensure efficient storage and retrieval of blobs. |
Partitioning | This determines how blobs are partitioned among different data nodes. |
Blob indexing | This shows us how to efficiently search for blobs. |
Pagination | This teaches us how to conceive a method for the retrieval of a limited number of blobs to ensure improved readability and loading time. |
Replication | This teaches us how to replicate blobs and tells us how many copies we should maintain to improve availability. |
Garbage collection | This teaches us how to delete blobs without sacrificing performance. |
Streaming | This teaches us how to stream large files chunk-by-chunk to facilitate interactivity for users. |
Caching | This shows us how to improve response time and throughput. |
Abstraction layers hide internal complexity from users and guide design decisions for routing and sharding. There are three primary layers:
User account: Identifies users via an
account_ID. It contains all user containers.Container: Identifies a set of blobs via a
container_ID.Blob: Identifies specific files via a
blob_ID. This layer maintains metadata vital for system availability and reliability.
The table below summarizes these layers.
Layered Information
Level | Uniquely identified by | Information | Sharded by | Mapping |
User’s blob store account |
| list of |
| Account -> list of containers |
Container |
| List of |
| Container -> list of blobs |
Blob |
| {list of chunks, chunkInfo: data node ID's,.. } |
| Blob -> list of chunks |
Note: We generate unique IDs for user accounts, containers, and blobs using a unique ID generator.
The system maintains metadata to manage storage. Let’s examine this data.
Blob metadata
When a user uploads a blob, it is split into small
Metadata includes chunk IDs, assigned data nodes, and replica IDs. Replicating chunks ensures reliability in case of node failure.
For example, a 128 MB blob split into two 64 MB chunks would have the following metadata:
Blob Metadata
Chunk | Datanode ID | Replica 1 ID | Replica 2 ID | Replica 3 ID |
1 | d1b1 | r1b1 | r2b1 | r3b1 |
2 | d1b2 | r1b2 | r2b2 | r3b2 |
Note: The system should ...