Design a file storage service like Google Drive

Design a file storage service like Google Drive

Explore how Google Drive powers global file storage and real-time collaboration. This deep dive covers uploads, distributed storage, metadata, permissions, and how files stay synced across billions of devices.

Mar 10, 2026
Share
editor-page-cover

Google Drive system design is the practice of architecting a globally distributed file storage and collaboration platform that handles chunked uploads, metadata management, real-time editing, cross-device sync, and granular access control at massive scale. A strong design balances durability, consistency, and low-latency access while supporting billions of users, trillions of files, and concurrent edits across regions.

Key takeaways

  • Separation of data and metadata paths: Decoupling blob storage from the metadata service allows each layer to scale, cache, and replicate independently based on its own access patterns.
  • Chunking with content-addressable deduplication: Splitting files into hashed chunks enables resumable uploads, bandwidth-efficient delta sync, and significant storage savings across users.
  • Conflict resolution defines collaboration quality: Choosing between Operational Transform, CRDTs, and last-write-wins determines how gracefully the system handles concurrent edits on documents vs. binary files.
  • Tiered storage controls cost at exabyte scale: Automatically migrating cold data to cheaper erasure-coded storage while keeping hot files on replicated, low-latency tiers is essential for sustainable growth.
  • Security and audit logging are non-negotiable: Encryption at rest and in transit, permission inheritance with fast revocation, and immutable audit trails are required to maintain user trust and regulatory compliance.


Most engineers think of Google Drive as a place to park PDFs. In reality, it is one of the most demanding distributed systems ever built: a platform that must store exabytes of data, serve sub-200 ms metadata lookups, support real-time collaborative editing, and enforce fine-grained permissions across billions of users simultaneously. Designing it well requires navigating trade-offs that surface in almost every system design interview, and in every real production storage system.

This guide walks through the complete architecture of a Google Drive-like service. It covers upload pipelines, storage tiers, metadata management, collaboration protocols, sync strategies, security, and the quantitative constraints that shape every decision. Where the original notes were directional, this expansion adds the missing depth on deduplication, presigned URLs, conflict resolution models, API design, erasure coding, audit logging, and capacity estimation.

Understanding the core problem#

At its heart, Google Drive is a globally distributed, multi-tenant file storage and real-time collaboration platform. Users upload files of all sizes, organize them into hierarchical folders, share them with fine-grained permissions, and access them from any device with the expectation of instant availability.

What elevates Drive beyond a simple object store is mutability. Files are edited, renamed, moved, and shared continuously. Multiple users may modify the same document at the same millisecond. A desktop sync client may push changes while a mobile client reads the same folder. The system must answer hard questions constantly: Where is this file physically stored? Which replica holds the latest version? Who is allowed to see it? What happens when two users edit the same paragraph offline and then reconnect?

Real-world context: Google reported in 2023 that over 3 billion users access Google Workspace products, with Drive storing well over an exabyte of data. At this scale, even a 0.01% inefficiency in storage or a 1 ms increase in metadata latency translates to enormous cost and user-visible degradation.

These questions are not academic. They define the subsystems, consistency models, and failure-handling strategies that compose the architecture. Before diving into components, we need to pin down exactly what the system must do and the constraints under which it must operate.

Functional and non-functional requirements#

Grounding the design in explicit requirements prevents scope creep and forces clarity on trade-offs. A Google Drive-like system has two categories of requirements: what users can do, and how well the system must do it.

Functional requirements#

From the user’s perspective, the system must support:

  • File and folder CRUD: Upload, download, create, rename, move, and delete files and folders.
  • Sharing and permissions: Grant view, comment, or edit access to individuals, groups, or anyone with a link.
  • Version history: Retain previous versions of every file, with the ability to view, compare, and restore.
  • Real-time collaboration: Allow multiple users to edit supported document types simultaneously.
  • Cross-device sync: Reflect changes across browser, desktop, and mobile clients with eventual convergence.
  • Search and discovery: Find files by name, content, owner, type, and modification date.

Non-functional requirements and scale targets#

Non-functional requirements shape every architectural decision. The table below summarizes the key targets.

Non-Functional Requirements Overview

Requirement

Specification

Details

Durability Target

99.999999999% (Eleven 9s)

Achieved via multi-device and multi-location data replication

Availability

β‰₯ 99.95%

Minimal annual downtime; service remains consistently accessible

Metadata Read Latency

p99 < 50ms

99% of metadata reads completed in under 50 milliseconds

File Download Latency

< 200ms (first byte)

Applies to frequently accessed ("hot") files

Upload Throughput

Up to 5 TB per file

Supports large-scale data transfers and ingestion

Consistency Model

Hybrid

Strong consistency for metadata writes & permissions; Eventual consistency for search & sync

Metadata Read QPS

Millions/sec (global)

Designed for high scalability under peak demand

Upload QPS

Hundreds of thousands/sec (global)

Enables efficient, large-scale concurrent file uploads

Durability is the hardest constraint. Users store irreplaceable photos, legal documents, and business-critical data. Losing even one file erodes trust irreversibly. Availability and latency follow closely because Drive is embedded in daily workflows.

Attention: Interviewers often test whether you distinguish between consistency requirements for different operations. File uploads and permission changes demand strong consistency, while search indexing and thumbnail generation can tolerate seconds or even minutes of staleness.

With requirements established, the next step is decomposing the system into its major subsystems and understanding how data flows between them.

High-level architecture overview#

A Google Drive-like system decomposes naturally into several independently scalable subsystems, each owning a specific concern. The separation between the file data path and the metadata path is the single most important architectural decision and a strong signal in interviews.

The major subsystems are:

  1. API gateway and client layer that handles authentication, rate limiting, and request routing.
  2. Upload and download service responsible for chunked ingestion and retrieval.
  3. Distributed blob storage that durably persists file data.
  4. Metadata service that manages file attributes, folder hierarchy, and ownership.
  5. Versioning service that tracks change history and supports rollback.
  6. Collaboration service that handles real-time co-editing for supported file types.
  7. Sync service that reconciles state across devices.
  8. Access control service that enforces permissions on every operation.
  9. Search and indexing service that supports full-text and metadata queries.
  10. Notification service that pushes updates to connected clients.

The following diagram illustrates how these components interact during a typical file upload and subsequent access.

Loading D2 diagram...
Google Drive-like system architecture with separated data and metadata paths

Pro tip: When presenting this architecture in an interview, emphasize that the data plane (blob upload/download) and the control plane (metadata, permissions, sync) are intentionally decoupled. This separation allows you to scale blob storage for throughput and metadata for low-latency reads independently.

The gateway and client layer manage authentication, TLS termination, and request routing, but the real complexity lives in the subsystems behind it. The upload pipeline is where it all begins.

File upload and ingestion pipeline#

Uploads are the entry point for all data in the system. They must handle files ranging from a 10 KB text file to a 5 TB video, over connections that may drop at any moment. Reliability, efficiency, and idempotency are the guiding principles.

Chunking strategy#

Large files are split into fixed-size chunks, typically 4 MB to 64 MB depending on the expected file-size distribution and network conditions. Each chunk is uploaded independently, allowing the client to resume after a network interruption without retransmitting the entire file.

Chunks are identified by their content hash (SHA-256), making them content-addressableA storage model where each chunk's identifier is derived from its content via a cryptographic hash, so identical data always maps to the same key. This has two critical benefits. First, the client can skip uploading any chunk whose hash already exists on the server, enabling instant deduplication. Second, the server can verify chunk integrity on receipt by recomputing the hash.

Real-world context: Dropbox’s public engineering blog describes how switching to content-addressable chunking reduced their storage footprint by over 20% across all users, because many people store identical files (OS installers, popular PDFs, shared images).

Presigned URLs for direct-to-storage uploads#

Routing multi-gigabyte file data through the application server is wasteful and creates a bottleneck. Instead, the upload service generates presigned URLsTime-limited, cryptographically signed URLs that grant the client temporary permission to upload directly to the blob storage backend (such as Amazon S3 or Google Cloud Storage) without proxying through the application tier.

The upload flow works as follows:

  1. The client sends a metadata request to the API gateway describing the file (name, size, MIME type, parent folder).
  2. The upload service creates a pending file record in the metadata database and returns a set of presigned URLs, one per chunk.
  3. The client uploads each chunk directly to blob storage using the presigned URL.
  4. On completion, the client notifies the upload service, which verifies all chunk hashes and assembles the file record.
  5. The metadata service marks the file as available and triggers downstream processing (thumbnail generation, indexing).

Loading D2 diagram...
Upload flow using presigned URLs with parallel chunk uploads

This pattern keeps the application tier lightweight and horizontally scalable, because it only handles small metadata requests while blob storage absorbs the heavy data transfer.

Resumable and idempotent uploads#

Every upload operation must be idempotentThe property that performing the same operation multiple times produces the same result as performing it once, preventing duplicate data or side effects from retries. If a client retransmits a chunk after a timeout, the system must not store it twice. Content-addressable hashing guarantees this naturally because re-uploading the same bytes produces the same hash and maps to the same storage location.

For resumability, the server tracks which chunks of a multi-part upload have been received. The client queries this state before resuming and only uploads missing chunks. Google’s own resumable upload protocol follows this exact pattern.

Pro tip: In an interview, mention that idempotent uploads also simplify retry logic in the sync client. If the client crashes mid-upload and restarts, it can safely re-attempt the entire upload without coordination, because the server deduplicates at the chunk level.

Once chunks land in blob storage, they need a durable, cost-efficient home. The next section explores how the storage layer is designed to achieve eleven-nines durability while managing exabyte-scale costs.

Distributed blob storage and durability#

The storage layer is where files live permanently. It must provide extreme durability, high throughput for reads and writes, and cost efficiency at exabyte scale. Two complementary strategies make this possible: replication for hot data and erasure coding for cold data.

Replication vs. erasure coding#

For frequently accessed files, the system stores multiple full replicas (typically three) across different failure domains: separate racks, availability zones, or even regions. This maximizes read throughput and minimizes latency because any replica can serve a request.

However, three-way replication carries a 3x storage overhead. For infrequently accessed or archived data, erasure codingA data protection technique that splits data into fragments, expands them with redundant parity fragments, and stores them across distributed nodes so that the original data can be reconstructed from any sufficiently large subset of fragments, typically achieving durability comparable to replication at 1.2x to 1.5x storage overhead. provides comparable durability at roughly 1.5x overhead.

Replication vs. Erasure Coding: Key Dimension Comparisons

Dimension

Replication

Erasure Coding

Storage Overhead

High β€” 3x replication = 200% overhead (3 TB raw for 1 TB usable)

Efficient β€” ~1.2–1.5x overhead (e.g., 7 TB raw for 5 TB usable in 5+2 scheme)

Read Latency

Low β€” any replica can serve the request directly

Higher β€” may require reconstruction from multiple fragments

Write Complexity

Simple β€” data written to N replicas simultaneously

Higher β€” requires encoding and generating parity fragments before writing

Durability

High β€” 3x replication tolerates up to 2 simultaneous failures

Equally high or better β€” distributes data/parity across nodes, tolerates multiple fragment losses

Best Use Case

Hot, frequently accessed data (e.g., databases, VM storage)

Cold, archival, infrequently accessed data (e.g., backups, media repositories)

A production system uses tiered storage. Newly uploaded files start on replicated hot storage. A background job monitors access patterns and migrates files that haven’t been read in 30 to 90 days to erasure-coded cold storage. This tiering reduces storage costs dramatically without sacrificing durability.

Immutability and deduplication#

At the storage layer, chunks are immutable. An edit to a file does not overwrite existing chunks. Instead, the system writes new chunks for modified portions and creates a new version record pointing to the mix of old and new chunks. This immutability simplifies consistency (no read-write conflicts on the same block), enables safe retries, and supports versioning naturally.

Deduplication operates at the chunk level. When the upload service receives a chunk hash that already exists in storage, it increments a reference count rather than writing new data. Across billions of users, deduplication rates of 20 to 40 percent are common, saving petabytes of storage.

Attention: Deduplication introduces a subtle dependency: you cannot delete a chunk until its reference count drops to zero. Implementing reference counting correctly in a distributed system requires careful coordination to avoid premature garbage collection or storage leaks.

With data stored durably, the system needs a fast, consistent way to look up file attributes, folder structures, and ownership. That is the role of the metadata service.

Metadata and directory service#

If blob storage is the body of the system, metadata is the nervous system. Every file operation (listing a folder, opening a file, checking permissions) begins with a metadata lookup. The metadata service must support extremely low-latency reads (under 50 ms at p99) and strongly consistent writes.

What metadata includes#

Each file or folder record contains:

  • File ID: A globally unique identifier.
  • Name, MIME type, size: Basic attributes displayed in the UI.
  • Parent folder ID: Defines the logical hierarchy.
  • Owner and creator IDs: Linked to the identity service.
  • Permission ACL reference: Points to the access control list.
  • Current version ID: Points to the latest version in the versioning service.
  • Chunk manifest: An ordered list of chunk hashes composing the current version.
  • Timestamps: Created, modified, last accessed.

Folder hierarchies are logical constructs. A folder is simply a metadata record whose children reference it via parent folder ID. There is no physical directory on disk. This means renaming or moving a folder is a single metadata update (changing the parent pointer), not a physical reorganization of bytes.

Storage choice for metadata#

Metadata requires strong consistency for writes (a permission change must be immediately visible) and low-latency reads. A distributed SQL database like Google’s Cloud Spanner or CockroachDB fits well. These databases provide serializable transactions, automatic sharding, and global replication.

The metadata database is sharded by file owner or organization ID to keep related files co-located and reduce cross-shard transactions. A caching layer (such as Memcached or Redis) sits in front of the database to absorb the read-heavy workload. Cache entries are invalidated on write using a lease-based or versioned-key strategy.

Historical note: Google’s original internal file metadata system was built on top of Bigtable, but the need for cross-row transactions and stronger consistency guarantees eventually led to the development of Spanner, which now underpins much of Google’s metadata infrastructure.

Metadata operations must feel atomic to users. When a user moves a file from folder A to folder B, the operation either completes fully or not at all. The metadata database’s transactional guarantees ensure this. But what happens when the file itself changes? That’s where versioning takes over.

Versioning and change history#

Versioning transforms a simple file store into a time-travel system. Every modification creates a new version, and previous versions are retained for a configurable period. Users can browse history, compare versions, and restore older states.

How versions are stored#

Because chunks are immutable and content-addressed, a new version only needs to store the chunks that changed. The version record contains a new chunk manifest pointing to a mix of existing (unchanged) chunks and newly written chunks. This approach is called delta-aware versioningA storage optimization where new file versions reference unchanged data blocks from prior versions rather than storing complete copies, dramatically reducing the incremental storage cost of each version.

For example, editing one paragraph in a 10 MB document might change only a single 4 MB chunk. The new version manifest points to the existing chunks for the unmodified portions and a single new chunk for the edited section. The incremental storage cost is 4 MB, not 10 MB.

Retention policy and cost management#

Storing unlimited versions forever is prohibitively expensive. Production systems implement retention policies. For example, Google Drive keeps 100 versions or 30 days of history (whichever comes first) for most file types, with longer retention for Workspace documents.

A background garbage collection process identifies version records that have expired and decrements the reference counts on their chunks. Chunks whose reference count drops to zero are eligible for deletion. This process must be conservative because premature deletion is catastrophic, so it typically runs with a grace period and double-checks references before removing data.

Pro tip: In an interview, discussing version retention policies and their cost implications shows maturity. Mention that versioning overhead can be estimated as $\\text{Version Storage Cost} \\approx N{\\text{avg_versions}} \\times \\bar{c} \\times S{\\text{chunk}}$ where $N{\\text{avg_versions}}$ is the average number of retained versions per file, $\\bar{c}$ is the average number of changed chunks per version, and $S{\\text{chunk}}$ is the chunk size.

Versioning gives individual users a safety net, but what happens when multiple users edit the same file at the same time? The collaboration service must reconcile those concurrent changes without data loss.

Real-time collaboration#

Real-time collaboration is the feature that separates a file locker from a productivity platform. When two engineers edit the same design document or three analysts update the same spreadsheet, the system must merge changes, resolve conflicts, and propagate updates to all participants within hundreds of milliseconds.

Collaboration protocols#

There are two dominant approaches to real-time conflict resolution in collaborative editing:

Operational Transform (OT) represents each edit as an operation (insert character at position 5, delete range 10 to 15) and transforms concurrent operations against each other so they can be applied in any order and converge to the same state. Google Docs uses OT. It works well for text-based documents but becomes complex for structured data like spreadsheets.

Conflict-free Replicated Data Types (CRDTs) are data structures that are mathematically guaranteed to converge when replicated across nodes, regardless of the order operations are received. CRDTs avoid the need for a central transformation server, making them attractive for peer-to-peer or offline-first architectures. Figma and some newer collaborative editors use CRDTs.

OT vs CRDTs: Feature Comparison

Dimension

Operational Transformation (OT)

CRDTs

Central Server Requirement

Requires a central server to coordinate and transform operations

No central server needed; designed for decentralized, independent clients

Implementation Complexity

High β€” especially for structured documents due to intricate transformation functions

Moderate β€” simpler conflict resolution by design, but data structures can be memory-intensive

Offline Support

Limited β€” server mediation typically required to resolve conflicts and sync changes

Strong β€” offline edits merge naturally once connectivity is restored

Best Fit

Real-time text editing with persistent connections and low-latency demands

Offline-capable and peer-to-peer editing scenarios requiring decentralized collaboration

For a Google Drive-like system, OT is the pragmatic default for document collaboration because Google’s own infrastructure has proven it at scale. CRDTs are a strong choice for newer features or offline-heavy use cases.

Handling non-document files#

Collaboration protocols like OT and CRDTs work for structured, text-based documents. But what about binary files like images, videos, or ZIP archives? These file types don’t have meaningful “merge” semantics.

For binary files, the system falls back to a simpler strategy: last-write-wins (LWW)A conflict resolution policy where the most recent write (by timestamp or version number) is accepted as the canonical version, and earlier concurrent writes are discarded or saved as separate conflicting copies. If two users upload different versions of a JPEG simultaneously, the system keeps the later write and optionally saves the other as a “conflicting copy” so no data is lost.

Real-world context: Dropbox uses the conflicting-copy approach for binary files. When a conflict is detected, both versions are preserved with one renamed to include “conflicting copy” in the filename, letting the user decide which to keep.

Collaboration happens in real time over persistent connections, but not all clients are always online. The sync service bridges the gap between connected and disconnected devices.

Sync across devices#

Sync is what makes Drive feel like a single, unified file system rather than a collection of independent storage buckets. A user edits a file on their laptop, closes it, and expects to see the changes on their phone within seconds. If they edited offline, the changes must merge correctly when connectivity returns.

Change tokens and incremental sync#

The sync client does not download the entire file tree on every check. Instead, it maintains a change tokenAn opaque cursor representing a point in the file system's change history, allowing clients to request only the changes that occurred since their last sync rather than re-fetching the full state. On each sync cycle, the client sends its last known change token to the server and receives a delta: the list of files and folders that changed since that token was issued.

This incremental approach dramatically reduces bandwidth and server load. A user with 50,000 files in Drive might see only 3 changes per sync cycle, requiring the server to return 3 metadata records rather than 50,000.

Offline editing and conflict reconciliation#

When a client edits files offline, it queues changes locally. On reconnection, the sync service uploads pending changes and compares them against the server’s current state. If the server version has also changed (another device or user made edits), the system must reconcile.

For documents, the collaboration protocols (OT or CRDTs) handle merging. For binary files, the LWW or conflicting-copy strategy applies. The sync service uses version vectors or logical timestamps to detect conflicts deterministically.

The sync protocol must also handle edge cases:

  • Renames during sync: A file renamed on device A while being edited on device B.
  • Deletes during sync: A file deleted on one device while open on another.
  • Folder moves during sync: A file moved to a new folder while its old folder is being synced.

Each case requires explicit policy decisions that are encoded in the sync client’s state machine.

Attention: Sync bugs are among the most insidious in a cloud storage system. A subtle ordering error in conflict resolution can cause silent data loss that users don’t notice for days. Robust sync implementations include checksums, server-side validation, and client-side journaling to detect and recover from inconsistencies.

Loading D2 diagram...
Sync client state machine lifecycle

Sync ensures data reaches all devices, but it must be gated by who is allowed to access what. Sharing and access control enforce those boundaries.

Sharing and access control#

Sharing is one of the most complex subsystems because it intersects with nearly every other component. Every file read, every metadata lookup, every sync response must pass through a permission check. The access control layer must be fast, consistent, and correct.

Permission model#

Permissions follow a role-based model with inheritance:

  • Roles: Viewer, Commenter, Editor, Owner.
  • Principals: Individual users, groups, domains, or ”anyone with the link.“
  • Inheritance: A file inherits permissions from its parent folder unless explicitly overridden.

Permission inheritance means that sharing a folder with a team automatically grants access to all files within it, recursively. This is powerful but introduces complexity. Moving a file from a shared folder to a private one must revoke inherited permissions immediately.

Enforcement and revocation#

Permission checks must be enforced at every entry point: the web UI, mobile app, desktop sync client, and public API. A centralized access control service evaluates permissions, backed by a cache for low-latency lookups.

Revocation must be near-instantaneous. When a user’s access is removed, subsequent requests must be denied within seconds. This means permission caches must use short TTLs or be actively invalidated on changes. Stale caches that allow unauthorized access even briefly are a security vulnerability.

Security, encryption, and audit logging#

Beyond permissions, the system must protect data in transit and at rest:

  • In transit: All connections use TLS 1.3. Internal service-to-service communication also uses mutual TLS (mTLS).
  • At rest: File chunks and metadata are encrypted with AES-256. Customer-managed encryption keys (CMEK) are offered for enterprise customers.
  • Audit logging: Every file access, permission change, and sharing action is recorded in an immutable audit log. These logs support compliance with regulations like GDPR and HIPAA, and enable forensic investigation after security incidents.
Pro tip: Audit logs should be append-only and stored separately from the primary metadata database to prevent tampering. In an interview, mentioning audit logging and compliance shows you think beyond functional correctness to operational and legal requirements.

Permission checks gate access, but users also need to find files efficiently. With thousands of files per user, browsing alone isn’t enough.

Search and indexing#

Search transforms Drive from a hierarchical folder browser into a powerful information retrieval system. Users search by filename, content, owner, file type, and modification date. At Google’s scale, the search index must handle billions of documents and return results in under 200 ms.

Index construction#

Search indexes are built asynchronously. When a file is uploaded or modified, a message is published to a processing queue. An indexing worker extracts text content (via OCR for images, text extraction for PDFs, raw content for documents), tokenizes it, and updates an inverted index.

The indexing pipeline uses batch and streaming processing. New uploads are indexed via the streaming path for freshness (typically within 30 to 60 seconds). Periodic batch jobs rebuild or optimize the index for query performance.

Query serving#

Query serving uses a distributed search engine (architecturally similar to Apache Lucene or Google’s internal search infrastructure). The index is sharded across machines, and queries fan out to multiple shards in parallel. Results are ranked by relevance, recency, and the user’s access permissions.

Historical note: Google’s internal search infrastructure for Drive and Gmail evolved from the same codebase that powers web search. The core techniques of inverted indexing, sharding, and ranking are shared, adapted for the smaller corpus size but higher freshness requirements of personal document search.

Permission filtering is applied during query serving, not after. The search engine only returns results the requesting user is authorized to see. This prevents information leakage and avoids wasting resources ranking inaccessible documents.

Search and sync generate heavy read traffic, and the system must serve billions of daily requests without melting down. Caching is the primary tool.

Caching and performance optimization#

Google Drive is dramatically read-heavy. For every file upload, there are hundreds of metadata reads, folder listings, permission checks, and thumbnail fetches. Caching at multiple layers is essential to meet latency targets and protect backend databases.

Cache layers#

The caching strategy operates at three tiers:

  • Client-side cache: The desktop sync client and mobile app cache file metadata and recently accessed file content locally. This eliminates network round trips for repeat access.
  • Edge/CDN cache: Thumbnails, previews, and popular public files are cached at edge locations close to users using a CDN like Google’s global edge network.
  • Server-side cache: An in-memory cache (Redis, Memcached) sits in front of the metadata database, absorbing the majority of read traffic. Cache entries are keyed by file ID and invalidated on writes.

Loading D2 diagram...
Three-tier caching architecture with fallback layers

Cache invalidation strategy#

Cache invalidation is famously one of the two hard problems in computer science. The system uses a conservative approach:

  • Write-through for metadata: When a file’s metadata is updated, the cache entry is invalidated synchronously before the write is acknowledged. This prevents stale reads at the cost of slightly higher write latency.
  • TTL-based for thumbnails and previews: These are regenerated periodically. A short TTL (30 to 60 seconds) ensures freshness without requiring active invalidation.
  • Event-driven for sync caches: The notification service pushes invalidation events to connected clients, ensuring they refresh stale entries promptly.
Real-world context: At Google’s scale, even a 1% cache miss rate translates to millions of database queries per second. Cache hit rates above 99% are a design goal, achieved through careful key design, warm-up strategies, and monitoring.

Caching handles steady-state performance, but failures are inevitable at exabyte scale. The system must degrade gracefully when things go wrong.

Failure handling and recovery#

A system serving billions of users will experience disk failures, network partitions, data center outages, and software bugs continuously. The architecture assumes failures are normal and builds resilience into every layer.

Storage-layer failures#

When a storage node fails, the replication or erasure coding layer ensures data remains available from other replicas or fragments. A background repair process detects under-replicated chunks and creates new replicas on healthy nodes. The target repair time is minutes, not hours, to minimize the window of reduced redundancy.

Metadata-layer failures#

The distributed metadata database (Spanner or equivalent) handles node failures through automatic leader election and replication. Read replicas absorb traffic during failovers. The cache layer provides an additional buffer, serving reads even if the database is briefly unavailable.

Client-side failures#

Sync clients may crash, lose network, or encounter disk errors. The idempotent upload protocol ensures that interrupted operations can be safely retried. The sync client maintains a local journal of pending operations and replays them on restart.

Attention: A particularly dangerous failure mode is ”split-brain“ in sync, where a client believes it is offline and accumulates local changes, while the server has accepted newer changes from another client. Robust conflict detection using version vectors is essential to prevent silent data loss in this scenario.

Regional isolation#

Failures are isolated by region. A data center outage in one region does not affect users served by other regions. Global metadata is replicated across regions with quorum writesA replication strategy where a write is considered successful only after a majority (quorum) of replicas acknowledge it, ensuring that any subsequent quorum read will see the latest write. This ensures that metadata remains consistent even during regional failures.

Resilience within a single region is necessary, but Drive is a global product. Scaling across continents introduces its own challenges.

Scaling globally#

Google Drive operates in dozens of regions worldwide. Users in Tokyo, SΓ£o Paulo, and Berlin all expect the same low-latency experience. Global scaling requires a hybrid approach that balances consistency with proximity.

Data placement and replication#

File data is stored in the region closest to the uploading user by default. For files shared across regions, the system creates read replicas in regions where collaborators are active. This replication is demand-driven, not blanket, to control costs.

Metadata is replicated globally using the distributed database’s built-in replication. Spanner, for example, uses synchronized clocks (TrueTime) to provide externally consistent reads across continents, ensuring that a permission change in New York is immediately visible to a reader in London.

Traffic routing and load balancing#

A global load balancer routes user requests to the nearest healthy region. DNS-based routing provides the first layer of steering, while application-level routing handles failover when a region is degraded.

Within each region, requests are distributed across service instances using consistent hashing for cache locality and round-robin for stateless services.

Capacity estimation#

A rough capacity model helps validate architectural choices. Consider:

  • Users: 2 billion active users.
  • Files per user: 5,000 average.
  • Total files: 10 trillion.
  • Average file size: 1 MB.
  • Total storage: $10^{13} \\times 10^6 = 10^{19}$ bytes = 10 exabytes (before replication).
  • With 2x average replication factor (mix of hot replicated and cold erasure-coded): 20 exabytes.
  • Metadata per file: ~1 KB. Total metadata: $10^{13} \\times 10^3 = 10^{16}$ bytes = 10 petabytes.
  • Daily uploads: Assuming 1% of users upload 2 files per day: 40 million uploads/day, ~460 uploads/second average, with peaks 5 to 10x higher.
  • Metadata reads: Assuming each active user triggers 100 metadata reads per day: 200 billion reads/day, ~2.3 million reads/second average.

These numbers confirm that metadata reads dominate, blob storage must be cost-optimized, and the system must handle burst traffic well above average.

Pro tip: Presenting back-of-the-envelope calculations like these in an interview demonstrates that you think quantitatively about system constraints. Even rough estimates (within an order of magnitude) are valuable for validating design choices.

With global scale comes global responsibility. Data integrity and user trust are the ultimate measures of the system’s success.

API design and client protocol#

A clean API contract defines how clients interact with the system. The API must be intuitive, efficient, and secure. Below is a representative set of core endpoints.

TypeScript
const BASE_URL = "https://api.example.com";
// Helper to build authorized fetch requests
async function apiFetch<T>(
path: string,
options: RequestInit = {},
token: string
): Promise<T> {
const res = await fetch(`${BASE_URL}${path}`, {
...options,
headers: {
"Content-Type": "application/json",
Authorization: `Bearer ${token}`,
...options.headers,
},
});
if (!res.ok) throw new Error(`API error ${res.status}: ${await res.text()}`);
return res.json() as Promise<T>;
}
interface UploadInitRequest {
fileName: string;
fileSize: number;
mimeType: string;
parentFolderId?: string;
chunkCount: number;
}
interface UploadInitResponse {
sessionId: string;
presignedUrls: string[]; // one per chunk
}
// Initiate a multipart upload session, returns presigned URLs per chunk
async function initUpload(
payload: UploadInitRequest,
token: string
): Promise<UploadInitResponse> {
return apiFetch<UploadInitResponse>("/v1/files/upload/init", {
method: "POST",
body: JSON.stringify(payload),
}, token);
}
// Upload a single chunk; idempotent via content hash header
async function uploadChunk(
sessionId: string,
chunkIndex: number,
chunkData: Blob,
contentHash: string,
token: string
): Promise<void> {
const res = await fetch(
`${BASE_URL}/v1/files/upload/${sessionId}/chunks/${chunkIndex}`,
{
method: "PUT",
headers: {
Authorization: `Bearer ${token}`,
"Content-Type": "application/octet-stream",
"X-Content-Hash": contentHash, // server uses this for idempotency
},
body: chunkData,
}
);
if (!res.ok) throw new Error(`Chunk upload failed: ${res.status}`);
}
interface CompleteUploadResponse {
fileId: string;
status: "assembling" | "indexed";
}
// Finalize upload; triggers server-side assembly and indexing
async function completeUpload(
sessionId: string,
token: string
): Promise<CompleteUploadResponse> {
return apiFetch<CompleteUploadResponse>(
`/v1/files/upload/${sessionId}/complete`,
{ method: "POST" },
token
);
}
interface FileMetadata {
fileId: string;
name: string;
mimeType: string;
size: number;
parentFolderId: string;
createdAt: string;
modifiedAt: string;
}
// Retrieve file metadata by ID
async function getFileMetadata(fileId: string, token: string): Promise<FileMetadata> {
return apiFetch<FileMetadata>(`/v1/files/${fileId}`, {}, token);
}
interface DownloadUrlResponse {
presignedUrl: string;
expiresAt: string;
}
// Get a presigned download URL for file content
async function getFileContent(
fileId: string,
token: string
): Promise<DownloadUrlResponse> {
return apiFetch<DownloadUrlResponse>(`/v1/files/${fileId}/content`, {}, token);
}
interface UpdateMetadataRequest {
name?: string;
parentFolderId?: string;
[key: string]: unknown;
}
// Update mutable file metadata (name, parent folder, etc.)
async function updateFileMetadata(
fileId: string,
updates: UpdateMetadataRequest,
token: string
): Promise<FileMetadata> {
return apiFetch<FileMetadata>(`/v1/files/${fileId}/metadata`, {
method: "PUT",
body: JSON.stringify(updates),
}, token);
}
interface SharePermission {
principalId: string;
principalType: "user" | "group";
role: "viewer" | "editor" | "owner";
}
interface ShareRequest {
permissions: SharePermission[];
}
interface ShareResponse {
fileId: string;
permissions: SharePermission[];
}
// Modify sharing permissions for a file
async function shareFile(
fileId: string,
shareRequest: ShareRequest,
token: string
): Promise<ShareResponse> {
return apiFetch<ShareResponse>(`/v1/files/${fileId}/share`, {
method: "POST",
body: JSON.stringify(shareRequest),
}, token);
}
interface FileVersion {
versionId: string;
fileId: string;
createdAt: string;
size: number;
createdBy: string;
}
// List all version history entries for a file
async function listFileVersions(
fileId: string,
token: string
): Promise<FileVersion[]> {
return apiFetch<FileVersion[]>(`/v1/files/${fileId}/versions`, {}, token);
}
interface ChangeEntry {
fileId: string;
changeType: "created" | "modified" | "deleted";
modifiedAt: string;
}
interface ChangesResponse {
changes: ChangeEntry[];
nextChangeToken: string; // use for subsequent polling
}
// Retrieve incremental changes since the last sync token
async function getChanges(
changeToken: string,
token: string
): Promise<ChangesResponse> {
return apiFetch<ChangesResponse>(
`/v1/changes?token=${encodeURIComponent(changeToken)}`,
{},
token
);
}
// Soft-delete a file (moves to trash, recoverable)
async function deleteFile(fileId: string, token: string): Promise<void> {
await apiFetch<void>(`/v1/files/${fileId}`, { method: "DELETE" }, token);
}
// Orchestrates a full chunked upload: init β†’ upload chunks β†’ complete
async function uploadFileInChunks(
file: File,
parentFolderId: string,
chunkSizeBytes: number,
authToken: string
): Promise<CompleteUploadResponse> {
const chunks: Blob[] = [];
for (let offset = 0; offset < file.size; offset += chunkSizeBytes) {
chunks.push(file.slice(offset, offset + chunkSizeBytes));
}
// Step 1: Initialize upload session
const { sessionId } = await initUpload(
{
fileName: file.name,
fileSize: file.size,
mimeType: file.type,
parentFolderId,
chunkCount: chunks.length,
},
authToken
);
// Step 2: Upload each chunk with its SHA-256 hash for idempotency
for (let i = 0; i < chunks.length; i++) {
const buffer = await chunks[i].arrayBuffer();
const hashBuffer = await crypto.subtle.digest("SHA-256", buffer);
const hashHex = Array.from(new Uint8Array(hashBuffer))
.map((b) => b.toString(16).padStart(2, "0"))
.join("");
await uploadChunk(sessionId, i, chunks[i], hashHex, authToken);
}
// Step 3: Finalize and trigger assembly
return completeUpload(sessionId, authToken);
}

The sync protocol uses the /changes endpoint with change tokens. Clients poll this endpoint periodically (every 5 to 30 seconds) or subscribe to a push channel via WebSocket or server-sent events for lower-latency updates.

Real-world context: Google Drive’s public API uses exactly this pattern. The Changes resource returns a page token that clients use for incremental sync, avoiding the need to re-list all files on every call.

Authentication uses OAuth 2.0 tokens. Every request includes a bearer token that the API gateway validates before forwarding to backend services. Rate limiting is applied per user and per application to prevent abuse.

With the API contract defined, we can consider how all these components are evaluated in the context of a system design interview.

How interviewers evaluate Google Drive system design#

Interviewers use Google Drive as a design prompt because it naturally tests multiple distributed systems concepts simultaneously. Knowing what they look for helps you structure your response.

Separation of concerns is the first thing evaluators notice. Clearly articulating why file data and metadata are stored, cached, and scaled independently is a strong positive signal. Candidates who treat ”storage“ as a monolithic blob often miss critical trade-offs.

Consistency reasoning is tested directly. Can you explain which operations require strong consistency (permission changes, file creation) and which tolerate eventual consistency (search results, thumbnail generation)? Blanket statements like ”we use eventual consistency everywhere“ or ”everything is strongly consistent“ are red flags.

Failure handling separates senior candidates from junior ones. Describing how the system recovers from storage node failures, network partitions, sync client crashes, and regional outages shows operational maturity.

Quantitative reasoning matters. Estimating storage requirements, QPS, and latency targets demonstrates that you can validate your design against real-world constraints rather than hand-waving about ”just add more servers.“

Google Drive system design interview evaluation rubric

Attention: A common interview mistake is spending too much time on the upload flow and not enough on collaboration, sync, and permissions. Interviewers want breadth across subsystems with depth in the areas you choose to emphasize.

Understanding these evaluation criteria helps you allocate your interview time wisely. Now let’s bring it all together.

Conclusion#

Designing a Google Drive-like system is an exercise in managing complexity across multiple dimensions simultaneously. The three most critical takeaways are the strict separation of the data plane (chunked blob storage) from the control plane (metadata, permissions, sync), the use of content-addressable chunking for deduplication, resumability, and delta-efficient versioning, and the deliberate choice of consistency models tuned to each operation’s requirements rather than applying a single model globally.

The future of cloud file storage is moving toward deeper AI integration (automatic organization, intelligent search, content understanding), tighter real-time collaboration primitives built on CRDTs for offline-first experiences, and edge computing that brings storage and compute closer to users for even lower latency. WebAssembly and local-first software patterns may eventually shift more processing to the client, reducing server dependency while maintaining the collaborative experience.

If you can design a system that stores exabytes of data, syncs changes across billions of devices, merges concurrent edits without data loss, and enforces permissions on every access, you’ve demonstrated the kind of system-level judgment that builds foundational cloud platforms.


Written By:
Mishayl Hanan