System Design: A Blob Store
Define a blob store, a core system design component for managing large volumes of unstructured data. Explain why applications such as YouTube rely on blob storage, and examine concepts such as access tiers and data life cycle management.
What is a blob store?
A blob store is a storage solution for unstructured data, such as photos, audio, video, and binaries.
Data is stored as a
For example, in systems like Microsoft Azure, blobs are immutable for a set interval to protect critical data.
Note: While not all applications require WORM, we assume blobs are immutable. Updates are handled by uploading a new version rather than modifying the existing object.
Why do we use a blob store?
Blob stores are essential for data-intensive platforms like YouTube, Netflix, and Facebook. These applications generate massive volumes of unstructured data daily and require scalable, reliable, and highly available storage.
As data grows, applications need unlimited storage capacity. For instance, YouTube adds over a petabyte of storage daily. Each video is stored in multiple resolutions and replicated across data centers for redundancy.
Consequently, the total storage footprint significantly exceeds the original upload size.
System | Blob Store |
Netflix | S3 |
YouTube | Google Cloud Storage |
Tectonic |
Cloud providers like Azure optimize cost and performance using storage access tiers.
Azure storage access tiers
Azure offers storage tiers tailored to specific usage patterns and access frequencies.
Note: We can switch between access tiers at any time.
Feature | Description | When to Use |
Azure files |
|
|
Azure blobs |
|
|
Azure Blob Storage life cycle management rules improve cost efficiency and compliance.
Blob life cycle management rules
Key management rules include:
Tiering: Move data to cooler storage tiers when access frequency drops to save costs.
Expiration: Automatically delete obsolete blobs based on age or inactivity.
Filtering: Apply rules to specific folders or containers to match data organization.
Let’s look at real-world scenarios for blob storage.
Use cases
Blob storage supports applications requiring efficient storage and delivery of unstructured data. Common scenarios include:
Serving images and documents directly to browsers.
Distributed file storage for multiple users.
Streaming video and audio content.
Backups, disaster recovery, and archiving.
Data lakes for on-premises or cloud-based analysis.
These examples highlight the versatility of blob storage. Next, we will step-by-step design a blob store system.
How do we design a blob store system?
This section is divided into five lessons:
Requirements: Identify functional and non-functional requirements and estimate resource needs.
Design: Define the high-level architecture, API, and detailed component workflow.
Design considerations: Explore database schema, partitioning, indexing, pagination, and replication strategies.
Evaluation: Assess the design against the initial requirements.
Quiz: Test your understanding of the material.
Let’s begin with the requirements.