Search⌘ K

System Design: A Blob Store

Get an introduction to the blob store and get ready to design it.

What is a blob store?

Blob store is a storage solution for unstructured data.

We can store photos, audio, videos, binary executable codes, or other multimedia items in a blob store. Every type of data is stored as a blobA blob (binary large object) consists of a collection of binary data stored as a single unit.. It follows a flat data organization pattern, where there are no hierarchies, such as directories and subdirectories.

Mostly, it’s used by applications with a specific business requirement called write once, read many (WORM), which states that data can only be written once and cannot be changed.

Just like in Microsoft Azure, the blobs are created once and read many times. Additionally, these blobs can’t be deleted until a specified interval, and they also can’t be modified to protect critical data.

A blob store storing and streaming large unstructured files like audio, video, images, and documents
A blob store storing and streaming large unstructured files like audio, video, images, and documents

Note: It isn’t necessary for all applications to have this WORM requirement. However, we assume that the blobs written can’t be modified. Instead of modifying, we can upload a new version of a blob if needed.

Why do we use a blob store?

Blob store is a crucial component of many data-intensive applications, including YouTube, Netflix, and Facebook.

The table below shows the blob storage used by some of the most well-known platforms. These applications generate large volumes of unstructured data daily and require a storage solution that is scalable, reliable, and highly available, particularly for storing large media files.

As their data continues to grow, these applications also need the ability to store an unlimited number of blobs.

According to some estimates, YouTube adds more than a petabyte of new storage each day. In a system like YouTube, each video is stored in multiple resolutions, and every resolution is replicated across multiple data centers and regions to ensure redundancy and availability.

This is why the total storage required for each video is significantly higher than the original upload size.

System

Blob Store

Netflix

S3

YouTube

Google Cloud Storage

Facebook

Tectonic

With the importance of blob storage established, the next step is to understand how cloud providers, such as Azure, optimize costs and performance through different storage access tiers.

Azure storage access tiers

Azure storage access tiers offer cost-effective storage options tailored to the various usage patterns and access frequencies of our data.

Azure Storage access tiers 
Azure Storage access tiers 

Note: We can change between these access tiers at any moment.

Feature

Description

When to Use

Azure files

  • Provides a Server Message Block (SMB) interface

  • Provides client libraries

  • Includes a REST interface for remote file storage

  • Cloud application migration (lift and shift)

  • Data sharing across several virtual machines

  • Keeping development and debugging tools available on many virtual computers

Azure blobs

  • Provides client libraries

  • Provides a REST interface for massively storing and retrieving unstructured data in block blobs

  • Remote access to application data

  • Support for streaming and random-access scenarios

Let’s examine the blob lifecycle management rules for effective data management solutions in Azure blob storage.

Blob life cycle management rules

Blob life cycle management rules simplify data management in Azure Blob Storage, improving cost efficiency, data retention compliance, and overall organization, while ensuring effective and seamless data life cycle management.

Here are the blob life cycle management rules:

  • Moving blobs to a cooler storage layer can improve performance and save costs: Blob life cycle management rules enable us to transfer data from a hot access tier to a cool access tier when it is accessed less frequently, resulting in cost savings without affecting accessibility.

  • Deleting blobs at the end of their life cycle: We may use these rules to create criteria, such as expiry dates or inactivity levels, to automatically destroy outdated or obsolete blobs, thereby facilitating data management and ensuring compliance.

  • Applying rules to filtered paths in the Storage account: We can selectively apply life cycle rules to specific folders or containers within our Storage account, tailoring the automated actions to our data organization and access patterns.

To see these concepts in action, let’s examine common real-world scenarios where blob storage plays a critical role.

Use cases

Blob storage is highly versatile and powers many real-world applications that require the efficient storage, access, or delivery of large amounts of unstructured data. Here are some common scenarios:

  • Serving pictures or documents to browsers immediately.

  • Storing documents for multiple users.

  • Broadcasting audio and video content.

  • Maintaining data for backup and restore, recovery from disasters, and archiving.

  • Data storage for analysis via an on-premises or Azure-hosted service.

These examples demonstrate how blob storage provides a reliable, scalable, and cost-effective solution for a wide range of applications, from media streaming to enterprise data management. With the principles, tiers, and life cycle management in mind, we are now ready to translate these concepts into a practical, step-by-step blob store System Design.

How do we design a blob store system?

We have divided the design of the blob store into five lessons and a quiz.

  1. Requirements: In this lesson, we identify the functional and non-functional requirements of a blob store. We also estimate the resources required by our blob store system.

  2. Design: This lesson presents a high-level design, the API design, and a detailed design of the blob store, while explaining the details of all components and the workflow.

  3. Design Considerations: In this lesson, we discuss several key aspects of design. For example, we learn about the database schema, partitioning strategy, blob indexing, pagination, and replication.

  4. Evaluation: In this lesson, we evaluate our blob store based on our requirements.

  5. Quiz: In this lesson, we assess understanding of the blob store design.

Let’s start with the requirements of a blob store system.