Search⌘ K
AI Features

System Design: A Blob Store

Define a blob store, a core system design component for managing large volumes of unstructured data. Explain why applications such as YouTube rely on blob storage, and examine concepts such as access tiers and data life cycle management.

What is a blob store?

A blob store is a storage solution for unstructured data, such as photos, audio, video, and binaries.

Data is stored as a blobA blob (binary large object) consists of a collection of binary data stored as a single unit.. Unlike hierarchical file systems with directories, blob stores use a flat organization pattern. Blob stores are ideal for applications that require write-once, read-many (WORM) storage. Data is written once, read frequently, and rarely modified.

For example, in systems like Microsoft Azure, blobs are immutable for a set interval to protect critical data.

A blob store storing and streaming large unstructured files like audio, video, images, and documents
A blob store storing and streaming large unstructured files like audio, video, images, and documents

Note: While not all applications require WORM, we assume blobs are immutable. Updates are handled by uploading a new version rather than modifying the existing object.

Why do we use a blob store?

Blob stores are essential for data-intensive platforms like YouTube, Netflix, and Facebook. These applications generate massive volumes of unstructured data daily and require scalable, reliable, and highly available storage.

As data grows, applications need unlimited storage capacity. For instance, YouTube adds over a petabyte of storage daily. Each video is stored in multiple resolutions and replicated across data centers for redundancy.

Consequently, the total storage footprint significantly exceeds the original upload size.

System

Blob Store

Netflix

S3

YouTube

Google Cloud Storage

Facebook

Tectonic

Cloud providers like Azure optimize cost and performance using storage access tiers.

Azure storage access tiers

Azure offers storage tiers tailored to specific usage patterns and access frequencies.

Azure storage access tiers 
Azure storage access tiers 

Note: We can switch between access tiers at any time.

Feature

Description

When to Use

Azure files

  • Provides a Server Message Block (SMB) interface

  • Provides client libraries

  • Includes a REST interface for remote file storage

  • Cloud application migration (lift and shift)

  • Data sharing across several virtual machines

  • Keeping development and debugging tools available on many virtual computers

Azure blobs

  • Provides client libraries

  • Provides a REST interface for massively storing and retrieving unstructured data in block blobs

  • Remote access to application data

  • Support for streaming and random-access scenarios

Azure Blob Storage life cycle management rules improve cost efficiency and compliance.

Blob life cycle management rules

Key management rules include:

  • Tiering: Move data to cooler storage tiers when access frequency drops to save costs.

  • Expiration: Automatically delete obsolete blobs based on age or inactivity.

  • Filtering: Apply rules to specific folders or containers to match data organization.

Let’s look at real-world scenarios for blob storage.

Use cases

Blob storage supports applications requiring efficient storage and delivery of unstructured data. Common scenarios include:

  • Serving images and documents directly to browsers.

  • Distributed file storage for multiple users.

  • Streaming video and audio content.

  • Backups, disaster recovery, and archiving.

  • Data lakes for on-premises or cloud-based analysis.

These examples highlight the versatility of blob storage. Next, we will step-by-step design a blob store system.

How do we design a blob store system?

This section is divided into five lessons:

  1. Requirements: Identify functional and non-functional requirements and estimate resource needs.

  2. Design: Define the high-level architecture, API, and detailed component workflow.

  3. Design considerations: Explore database schema, partitioning, indexing, pagination, and replication strategies.

  4. Evaluation: Assess the design against the initial requirements.

  5. Quiz: Test your understanding of the material.

Let’s begin with the requirements.