Pastebin System Design

Pastebin System Design

A Pastebin system design blog covering stateless servers, cache-first reads, paste-ID datastore, object storage for large pastes, TTL expiration, and abuse controls.

Mar 11, 2026
Share
editor-page-cover

Pastebin system design is the practice of architecting a highly available, read-heavy service that allows users to create, store, and share text snippets via unique short URLs. A well-structured design addresses everything from scalable storage and caching to key generation, abuse prevention, and global content delivery, making it one of the most revealing problems in a system design interview.

Key takeaways

  • Read-to-write ratio drives architecture: A typical paste service sees 5:1 to 10:1 read-to-write ratios, so caching, CDN edge nodes, and database read replicas must be the primary focus of your design.
  • Key generation is a critical subsystem: Using a Base62 encoding scheme with an offline key generation service avoids runtime collisions and keeps paste URLs short, unpredictable, and globally unique.
  • Storage must be tiered by content size: Small paste metadata belongs in a fast key-value store, while large paste bodies (over a few hundred KB) should be offloaded to object storage like Amazon S3 for cost efficiency.
  • Concrete numbers earn interview credibility: Defining SLOs upfront (for example, p95 read latency under 150ms and 99.9% availability) anchors every subsequent design decision in measurable constraints.
  • Observability is not optional: Tracking cache hit ratios, error rates, and paste size distributions through a telemetry pipeline is what separates a whiteboard sketch from a production-grade design.


Most engineers can explain what Pastebin does in a single sentence, yet surprisingly few can design one that survives its first viral paste. That gap between “I understand the product” and “I can architect the system” is exactly what interviewers exploit. A paste-sharing service looks trivial on the surface: accept text, return a URL, serve it back. But underneath that simplicity hides a minefield of decisions about storage tiers, cache stampedes, key collision, abuse mitigation, and global latency. This guide walks through every layer of that minefield, from back-of-the-envelope math to CDN invalidation strategies, so you can present a design that sounds less like a textbook summary and more like a battle-tested architecture.

Clarifying requirements before designing anything#

Jumping straight into boxes and arrows is one of the fastest ways to lose credibility in a system design interview. The first five minutes should be a structured conversation about scope. What must the system do? What quality attributes matter most? Pinning these down early prevents you from over-engineering features nobody asked for while under-engineering the ones that matter.

Functional requirements#

The core feature set for a Pastebin-style service is deliberately small, which is precisely what lets an interviewer push you deeper on each one:

  • Create paste: A user submits a text snippet (with an optional language tag) and receives a unique, short URL containing the Paste ID.
  • Retrieve paste: Anyone with the URL can fetch the raw content and its metadata.
  • Paste expiration: Users can set a TTL (ten minutes, one hour, one day, or never). The system must honor this and remove content accordingly.
  • Visibility control: Pastes can be public (accessible to anyone with the link) or private (accessible only to the authenticated creator).
  • Syntax highlighting: The UI renders content with language-appropriate highlighting, driven by the language tag stored in metadata.
Attention: Many candidates treat “syntax highlighting” as a frontend-only concern and skip it entirely. Interviewers often use it to probe whether you store language metadata and how you handle rendering at scale (server-side vs. client-side).

Non-functional requirements and SLO targets#

This is where your design stops being generic and starts being yours. Competitors that rank well for this topic consistently anchor their designs in concrete numbers, so define yours upfront.

Non-Functional Requirements: SLO Targets and Design Implications

NFR

Definition

Typical SLO Targets

Key Design Implications

Scalability

Ability to handle increased load without compromising performance

10,000 concurrent users; 5,000 TPS

Horizontal scaling, load balancers, stateless services, database sharding

Availability

Proportion of time the system is operational and accessible

99.9% monthly uptime; 99.99% annual uptime

Redundancy, failover mechanisms, system monitoring, disaster recovery

Latency

Time taken for the system to respond to a request

p95 < 200ms; p99 < 500ms

Caching strategies, query optimization, reduced network hops, async processing

Durability

Ensuring committed data is never lost

Zero lost writes; data recoverable after failures

Data replication, write-ahead logging, regular backups, distributed storage

Cost Efficiency

Delivering performance and reliability within budget

Capped monthly operational costs; defined cost per transaction/user

Resource optimization, auto-scaling, cost-effective cloud services, expense reviews

  • Scalability: Support roughly 1 million new pastes per day, with a 5:1 to 10:1 read-to-write ratio, meaning 5 to 10 million read requests daily.
  • Availability: Target 99.9% uptime (roughly 8.7 hours of allowed downtime per year).
  • Latency: p95 read latency under 150ms, p95 write latency under 300ms.
  • Durability: Zero data loss for non-expired pastes. Once stored, content must survive hardware failures.
  • Cost efficiency: Because many pastes are written once and never read, the storage layer must be tiered so cold data does not consume expensive high-IOPS storage.

These numbers are not arbitrary. They cascade through every architectural choice: the caching layer exists because of the latency target, the tiered storage exists because of the cost target, and the sharding strategy exists because of the scalability target.

With requirements locked down, we can move on to the back-of-the-envelope estimation that will size our infrastructure.

Capacity estimation and scale math#

Walking through the math in an interview is not about getting exact numbers. It is about demonstrating that you think about infrastructure sizing before picking technologies. Rough estimates prevent you from proposing an architecture that is either wildly over-provisioned or doomed to collapse under real load.

Traffic estimation#

Assume 1 million new pastes per day. With a read-to-write ratio of 10:1, that gives us approximately 10 million read requests per day.

  • Write QPS: $\\frac{1{,}000{,}000}{86{,}400} \\approx 12$ writes per second on average, with peaks of roughly 50 to 100 writes per second.
  • Read QPS: $\\frac{10{,}000{,}000}{86{,}400} \\approx 116$ reads per second on average, with peaks of roughly 500 to 1,000 reads per second during viral events.

Storage estimation#

If the average paste size is 10 KB and we store 1 million new pastes per day:

$$\\text{Daily storage} = 1{,}000{,}000 \\times 10 \\text{ KB} = 10 \\text{ GB/day}$$

Over five years (assuming no expiration for “never expire” pastes):

$$\\text{Total storage} \\approx 10 \\text{ GB} \\times 365 \\times 5 = 18.25 \\text{ TB}$$

That figure is well within the range of a moderately sized distributed database or object storage bucket. However, the metadata (Paste ID, timestamps, user ID, flags) is much smaller, perhaps 500 bytes per paste, totaling under 1 TB over five years. This size difference is exactly why we separate metadata from content.

Pro tip: Always state your assumptions out loud in an interview. Saying “I am assuming an average paste size of 10 KB” gives the interviewer a chance to adjust the constraint, and it shows you understand that the entire design shifts if that number changes to 10 MB.

The following diagram captures how these numbers map to the infrastructure tiers we will design next.

Loading D2 diagram...
System capacity flow with write and read paths

Now that we have sizing constraints, we can design the high-level architecture that supports them.

High-level system architecture#

A paste-sharing service is a textbook example of a read-heavy, low-latency system with simple access patterns. The architecture should reflect that simplicity: decoupled, stateless services connected by well-defined interfaces, with caching and storage tiers sized according to the math we just completed.

The end-to-end flow works as follows. A Client (browser or API consumer) sends a request to the API Gateway, which handles SSL termination, rate limiting, authentication, and basic input validation. The gateway forwards the request to one of many stateless Application Servers behind a load balancer. For a write request, the application server calls a Key Generation Service to obtain a unique Paste ID, stores the metadata in the Metadata Database, and uploads the raw content to Object Storage (or stores it inline in the database if the paste is small). For a read request, the server first checks the Distributed Cache. On a hit, the paste is returned immediately. On a miss, the server queries the metadata store and then fetches the content from object storage or the database, populating the cache before responding.

Key architectural components include:

  • API Gateway / Load Balancer: The single entry point. Distributes traffic using algorithms like Least Connections or Weighted Round Robin. Terminates TLS and enforces rate limits.
  • Application Servers: Stateless services running the create, retrieve, and delete logic. Being stateless means any server can handle any request, enabling frictionless horizontal scaling.
  • Key Generation Service: A dedicated microservice that produces unique, short Paste IDs. Decoupling this prevents ID-generation logic from becoming a bottleneck inside the application servers.
  • Metadata Database: Stores Paste ID, expiration time, creation time, user ID, visibility flag, content URL, and language tag. Optimized for fast key-value lookups.
  • Distributed Cache (Redis or Memcached): Holds recently or frequently accessed pastes to absorb the majority of read traffic.
  • Object Storage (e.g., Amazon S3): Stores the raw paste content, especially for large pastes. Highly durable and cost-effective.
Real-world context: Pastebin.com itself has reported serving billions of page views with a relatively small engineering team. The key enabler is aggressive caching and a CDN layer that absorbs the vast majority of read traffic before it ever hits the origin servers.

Loading D2 diagram...
Pastebin high-level system architecture with read/write paths

With the high-level picture in place, we need to zoom into the component that makes or breaks the user experience: the Paste ID generation scheme.

Key generation and URL scheme#

The Paste ID is the identity of every paste. It appears in the URL, serves as the primary key in the database, and acts as the cache key. A poorly designed key scheme leads to collisions, predictability (security risk), or unnecessarily long URLs. This subsystem deserves dedicated attention.

Encoding scheme and length#

Most production systems use Base62 encodingAn encoding scheme that uses 62 characters (a-z, A-Z, 0-9) to represent a numeric value as a compact, URL-safe alphanumeric string. A 6-character Base62 string gives us $62^6 \\approx 56.8$ billion unique keys, which is more than enough for our 5-year storage estimate of roughly 1.8 billion pastes.

An 8-character string pushes that to $62^8 \\approx 218$ trillion, providing massive headroom. For most designs, 6 to 8 characters strike the right balance between URL brevity and key space.

Online vs. offline key generation#

There are two dominant strategies:

  • Online generation: When a paste is created, the application server generates a key on the fly, typically by hashing the content or a combination of content, timestamp, and user ID with MD5 or SHA-256, then Base62-encoding a portion of the hash. The risk here is collision, where two different inputs produce the same truncated hash. You must check the database for an existing key and retry on collision.
  • Offline generation (Key Generation Service / KGS): A separate service pre-generates a large pool of unique keys and stores them in a key database. When an application server needs a key, it requests one from the KGS, which marks it as “used.” This eliminates runtime collision entirely and moves the uniqueness guarantee to a simpler, centralized subsystem.
Pro tip: The offline KGS approach is almost always the better answer in an interview. It decouples key generation from the write path, removes collision-handling complexity from the application layer, and is easy to scale by simply pre-generating keys in larger batches.

Security considerations#

Sequential or timestamp-based IDs are predictable, allowing attackers to enumerate and scrape pastes. Base62 keys generated randomly (or drawn from a shuffled pool in the KGS) are effectively opaque, preventing enumeration. For private pastes, an additional random token appended to the URL provides a second layer of access control beyond authentication.

Online Hash-Based Key Generation vs. Offline KGS Comparison

Criteria

Online Hash-Based Key Generation

Offline Key Generation Service (KGS)

Collision Risk

Moderate to low depending on hash function (MD5: ~2⁻⁢⁴; SHA-256: ~2⁻¹²⁸)

Negligible β€” centralized service ensures unique key generation

Latency Overhead

~6.8 ms per hash; cumulative impact in high-throughput environments

LAN: ~67.71 ms; WAN: up to ~533.26 ms per request

Implementation Complexity

Low β€” standard hash functions are straightforward to apply; uniqueness management adds minor complexity

High β€” requires secure infrastructure, key distribution mechanisms, and access controls

Scalability

High β€” hash functions handle large data volumes efficiently; collision probability rises with key volume

Moderate β€” scalable with proper architecture, but resource-intensive as demand grows

With a reliable key scheme in place, we need to decide where and how to store the data those keys point to.

Data model and storage choices#

The data model for a paste-sharing service is deceptively simple, but the storage decisions you make here directly determine your system’s cost profile, latency characteristics, and operational complexity.

Schema design#

The metadata schema should be lean. Every field must justify its existence by supporting a core query pattern or a non-functional requirement.

Sql
CREATE TABLE paste_metadata (
paste_id VARCHAR(8) NOT NULL, -- short unique identifier for each paste
user_id VARCHAR(255), -- nullable: NULL means anonymous paste
content_url VARCHAR(2048) NOT NULL, -- object-storage key where paste content is stored
language VARCHAR(64) NOT NULL, -- syntax-highlighting tag (e.g. 'python', 'json')
visibility VARCHAR(7) NOT NULL -- restricted to 'public' or 'private' via CHECK
CHECK (visibility IN ('public', 'private')),
created_at TIMESTAMP NOT NULL DEFAULT NOW(), -- record creation time, defaults to current time
expires_at TIMESTAMP, -- nullable: NULL means the paste never expires
size_bytes INTEGER NOT NULL, -- byte length of the stored paste content
CONSTRAINT pk_paste_metadata PRIMARY KEY (paste_id)
);
-- Index expires_at to efficiently query and purge expired pastes
CREATE INDEX idx_paste_metadata_expires_at
ON paste_metadata (expires_at)
WHERE expires_at IS NOT NULL; -- partial index skips non-expiring rows

The expires_at column is indexed to support efficient batch deletion by background cleanup jobs. The content_url field stores the object storage key for the raw paste content, decoupling metadata from the potentially large payload.

SQL vs. NoSQL for metadata#

This is a classic trade-off question, and the right answer depends on your access patterns.

SQL vs. NoSQL Database Comparison for Paste Metadata Storage

Feature

PostgreSQL

MySQL

DynamoDB

Cassandra

Consistency Model

Strong (ACID)

Strong (ACID)

Tunable (eventual β†’ strong)

Tunable (eventual β†’ strong)

Scaling Approach

Vertical (sharding possible)

Vertical + read replicas

Automatic horizontal

Native horizontal

Query Flexibility

Full SQL (joins, subqueries, aggregations)

Full SQL (complex queries)

Key-value & document; limited secondary indexes

CQL (no joins or subqueries)

Operational Complexity

High (schema migrations, tuning)

High (replication, schema management)

Low (fully managed service)

Medium–High (replication, consistency tuning)

Key-Value Lookup Fit

Moderate (index-dependent)

Moderate (index-dependent)

Excellent (purpose-built)

Good (high availability trade-offs)

Given that the dominant access pattern is a single-key lookup by Paste ID, a key-value storeA database optimized for storing and retrieving data as simple key-value pairs, where the key (Paste ID) directly maps to the associated value (metadata), enabling O(1) average lookup time. like DynamoDB or Cassandra is the natural fit. These systems are designed for horizontal scaling with consistent, low-latency reads at massive throughput. If you need relational queries later (such as “fetch all pastes by user”), you can introduce a secondary index or a separate SQL store for that specific access pattern.

Separating content from metadata#

This is one of the most important architectural decisions and a point that top competitors consistently emphasize. Paste content (the raw text) and paste metadata (ID, timestamps, flags) have fundamentally different access profiles:

  • Metadata is small (under 1 KB), queried on every request, and must be lightning-fast.
  • Content can be large (up to 512 KB or more), is fetched only after metadata confirms the paste exists and is accessible, and is read-heavy but tolerant of slightly higher latency.

Storing both in the same high-IOPS database wastes expensive resources on large blobs that would be cheaper in object storage. The design pattern is: store metadata in the fast key-value store, store content in Amazon S3 or equivalent object storage, and link them via the content_url field.

Attention: For very small pastes (under a few KB), the overhead of an extra network hop to object storage may exceed the cost savings. A common optimization is to store the content inline in the metadata record when it falls below a size threshold (say, 10 KB), and only offload to object storage for larger pastes.

The storage layer is now defined, but it will buckle without a robust caching strategy. Let us design the caching and content delivery layers next.

Caching strategy and CDN integration#

Caching is not an optimization for a paste-sharing service. It is a structural requirement. Without it, the metadata database and object storage would need to handle 10 million+ daily reads directly, driving up both latency and cost. The goal is to serve the vast majority of reads from memory or edge locations, touching the origin only on cache misses.

Multi-layer caching#

A production-grade design uses multiple cache tiers:

  1. Application-level in-memory cache: A small LRU cache on each application server holding the absolute hottest pastes. This avoids even the network hop to the distributed cache for the most popular content.
  2. Distributed cache cluster (Redis or Memcached): The primary caching layer. Stores paste metadata and content for recently and frequently accessed pastes. A Redis cluster with a few hundred GB of memory can absorb the majority of read traffic.
  3. CDN edge cache: For globally distributed users, a CDN (Content Delivery Network)A geographically distributed network of proxy servers that cache content at edge locations close to end users, reducing latency by serving requests from the nearest edge node instead of the origin server. like Cloudflare or Amazon CloudFront caches paste content at edge locations worldwide. This can reduce p95 read latency from 150ms to under 50ms for repeat access.
Real-world context: GitHub Gist, a service with a nearly identical access pattern to Pastebin, relies heavily on CDN caching and aggressive HTTP cache headers to serve millions of snippet views without overwhelming their origin servers.

Cache invalidation and TTL alignment#

The simplest and most effective invalidation strategy for paste content is to align the cache TTL with the paste’s expiration time. When a paste is created with a one-hour expiration, the cache entry is written with a one-hour TTL. When the TTL expires, the entry is automatically evicted. No explicit invalidation logic is needed for expiring pastes.

For pastes marked “never expire,” a longer but finite cache TTL (say, 24 hours) ensures that the cache is periodically refreshed from the source of truth without growing unbounded. The LRU (Least Recently Used) eviction policyA cache eviction strategy that removes the least recently accessed item when the cache reaches capacity, ensuring that the most frequently accessed data remains in memory. handles capacity pressure by evicting cold pastes naturally.

The complication arises when a user manually deletes a paste before its expiration. In that case, the delete operation must synchronously invalidate the entry in both the distributed cache and issue a CDN purge request. This is one of the few write-path operations that must touch the cache directly.

Loading D2 diagram...
Multi-layer caching architecture with performance metrics

Handling hot keys and cache stampedes#

A viral paste can turn a single cache key into a hot keyA cache or database key that receives a disproportionately high volume of requests, creating a bottleneck on the single node responsible for that key.. If that key expires from the cache and thousands of concurrent requests simultaneously trigger a cache miss, they all hit the database at once, a phenomenon called a cache stampede.

Mitigation strategies include:

  • Request coalescing: When a cache miss occurs, only one request is allowed through to the database. All other concurrent requests for the same key wait for the first one to populate the cache and then read from it.
  • Hot key replication: Replicate the hottest keys across multiple cache shards so that no single shard bears the full read load.
  • Staggered TTLs: Add a small random jitter to TTL values (e.g., TTL Β± 30 seconds) so that popular keys do not all expire at the same instant.
Pro tip: In an interview, mentioning “cache stampede” and “request coalescing” by name signals production-level experience. These are the kinds of details that differentiate a senior-level answer from a mid-level one.

With caching absorbing the read load, we need to ensure the write path and the underlying infrastructure can scale horizontally as traffic grows.

Scaling the system#

Scalability for a paste-sharing service is less about handling enormous write throughput (12 QPS average is modest) and more about absorbing unpredictable read spikes and growing storage gracefully over years. The architecture must scale horizontally at every layer without introducing single points of failure.

Database sharding#

As the metadata store grows past what a single node can handle, we shard the data across multiple database nodes. The Paste ID is the natural shard key because every query is a direct lookup by that key. A consistent hashing scheme distributes Paste IDs evenly across shards and minimizes data movement when shards are added or removed.

Historical note: Early versions of Pastebin and similar services ran on single MySQL instances that eventually hit vertical scaling limits. The migration to sharded NoSQL stores (or sharded MySQL with ProxySQL) was a common and painful evolution that modern designs avoid by choosing horizontally scalable databases from the start.

Stateless application servers and load balancing#

Every application server must be stateless, meaning it holds no session data or paste content locally. All state lives in the cache, database, or object storage. This allows the load balancer to route any request to any server, and enables autoscaling groups to add or remove instances based on CPU or request-rate metrics.

Common load balancing strategies:

  • Round Robin: Simple and effective when all servers have identical capacity.
  • Least Connections: Routes to the server with the fewest active connections, better for workloads with variable request duration.
  • Weighted routing: Assigns different weights to servers based on their hardware capabilities or geographic location.

Read/write path separation#

For systems with highly asymmetric read and write loads, separating the read and write paths offers independent scaling. Write requests flow to a primary database node (or shard leader), while read requests are served by multiple read replicas. This introduces replication lagThe delay between when data is written to a primary database node and when that write is propagated to and visible on its read replicas, potentially causing stale reads during the lag window., but for a paste-sharing service, a few hundred milliseconds of staleness is acceptable. A user who just created a paste can be routed to the primary for their immediate read-after-write, while all subsequent readers hit replicas.

Loading D2 diagram...
Distributed system architecture with read/write path separation

Scaling the happy path is necessary but insufficient. A robust system must also handle paste expiration, large content, and the cleanup of expired data without impacting the main request path.

Handling paste expiration and large content#

Expiration and content size management are operational concerns that rarely get enough attention in interview answers. Yet they directly affect storage cost, data hygiene, and even legal compliance.

TTL-based deletion#

The primary expiration mechanism leverages built-in TTL features in both the cache and the database:

  • Redis: Natively supports key-level TTL. When a paste is cached, the EXPIREAT command sets the exact Unix timestamp for automatic eviction.
  • DynamoDB: Supports DynamoDB TTL, which automatically deletes items after their expires_at timestamp. Deletions happen in the background and do not consume write capacity.
  • Cassandra: Supports per-column TTL at write time, automatically tombstoning expired data during compaction.

Background cleanup jobs#

Database TTL handles metadata deletion, but it does not clean up the corresponding object storage files. A background worker service runs on a periodic schedule (every few minutes), queries the metadata store for recently expired Paste IDs, and issues delete requests to object storage for the associated content files.

These cleanup jobs must be:

  • Idempotent: Running the same job twice should not cause errors or double-deletions.
  • Low priority: They should run on separate compute resources or during off-peak hours to avoid competing with the main read/write path.
  • Batched: Deleting objects one at a time from S3 is slow. Use the S3 batch delete API to remove up to 1,000 objects per request.
Attention: DynamoDB TTL deletions are not instantaneous. Items may persist for up to 48 hours after their expiration timestamp. Your application must also check the expires_at field at read time and return a 404 for expired pastes, even if the database has not yet physically deleted the record.

Object storage for large pastes#

For pastes exceeding a size threshold (10 KB is a reasonable cutoff), the content is stored in object storage with the Paste ID as the object key. The metadata record stores only the content_url pointer. This keeps the metadata database lean and fast while offloading bulk storage to a system purpose-built for durability and cost efficiency. Object storage like S3 provides 99.999999999% (11 nines) durability, far exceeding what most databases offer natively.

The system must also handle the edge case where object storage is temporarily unavailable. In that scenario, the read path returns an appropriate error (503 Service Unavailable) and the client retries, rather than serving a broken or empty paste.

Expiration and storage are now covered. Next, we address the non-negotiable requirements for a public-facing service: reliability, security, and abuse prevention.

Reliability, security, and abuse prevention#

A public paste-sharing service is a magnet for abuse. Spam, malware distribution, phishing links, and denial-of-service attacks are daily realities. The design must include layered defenses that protect both the infrastructure and the end users.

Rate limiting#

Rate limiting at the API Gateway is the first line of defense. It constrains the number of requests a single IP address or authenticated user can make within a sliding time window.

  • Creation rate limit: 10 to 20 new pastes per minute per IP. This prevents automated spam bots from flooding the system.
  • Read rate limit: Higher threshold (hundreds per minute per IP) because reads are less resource-intensive, but still capped to prevent scraping.
  • Graduated penalties: Repeated violations trigger progressively longer cooldown periods, escalating from temporary throttling to a full IP ban.

A token bucket algorithmA rate-limiting algorithm that allows requests up to a fixed "bucket" capacity and refills tokens at a steady rate, smoothly handling burst traffic while enforcing a sustained average rate limit. is commonly used for its ability to handle short bursts while enforcing a sustained average rate.

Access control for private pastes#

For private pastes, the application server validates the user’s authentication token (typically a JWT) and compares the user_id in the token against the user_id stored in the paste metadata. If they do not match, the server returns a 403 Forbidden. This check must happen before fetching the content from the cache or storage to avoid leaking data.

Content moderation and abuse detection#

While full-scale content moderation is a massive engineering problem on its own, basic safeguards are essential:

  • URL scanning: Check paste content against known malware and phishing URL databases (such as Google Safe Browsing) during the creation flow.
  • Blocklist matching: Reject pastes that match known spam patterns or contain content on a blocklist.
  • Heuristic flagging: Flag pastes that are created in rapid succession from the same IP, contain repetitive content, or exhibit other bot-like patterns. These flagged pastes can be queued for manual review or automated quarantine.
Real-world context: Pastebin.com was historically used to leak stolen credentials and share malicious scripts. In response, they implemented automated content scanning and a “Scraping API” to allow security researchers to monitor public pastes for threats, a model that balances openness with safety.

Data retention and compliance#

Define a clear retention policy:

  • Pastes with an expiration date are permanently deleted after that date (plus the TTL lag buffer).
  • Pastes marked “never expire” are stored indefinitely but may be subject to a maximum retention period (e.g., 10 years) for cost control.
  • Deleted paste content must be purged from all layers: cache, database, object storage, and CDN.

With the system hardened against abuse, we can now layer on the operational visibility needed to run it in production.

Observability and monitoring#

A system you cannot observe is a system you cannot operate. Observability is not a “nice to have” section to mention at the end of an interview. It is infrastructure that directly supports your availability and latency SLOs.

Key metrics to track#

A well-instrumented paste-sharing service monitors the following:

  • Request rate: Create and retrieve QPS, broken down by endpoint and status code.
  • Latency percentiles: p50, p95, and p99 for both read and write operations. An increase in p99 often signals a degrading cache hit ratio or a slow database shard.
  • Cache hit ratio: The percentage of read requests served from cache. A healthy system should see 80% or higher. A sudden drop indicates a potential cache node failure or a change in traffic pattern.
  • Error rate: 4xx and 5xx responses per second. A spike in 5xx errors triggers immediate alerting.
  • Storage utilization: Total objects in S3, metadata database size, cache memory usage. Tracked for capacity planning.
  • Paste size distribution: A histogram of paste sizes helps identify whether the inline-vs-object-storage threshold is well-tuned.

Logging and alerting#

Structured logs (JSON format) from every application server and background job should be aggregated into a centralized logging system (such as the ELK stack or Datadog). Alerts should be configured for:

  • Cache hit ratio dropping below 75%.
  • p95 latency exceeding the 150ms SLO for more than 5 minutes.
  • Error rate exceeding 1% of total requests.
  • Background cleanup jobs failing or falling behind schedule.
Pro tip: In an interview, mentioning specific metrics like “cache hit ratio” and “p99 latency” and explaining what each one tells you about system health demonstrates operational maturity. It shows you have thought beyond the whiteboard and into production operations.

Observability dashboard for paste-sharing service

Monitoring tells you when something is wrong. The next section examines the architectural trade-offs that determine what can go wrong and how you have chosen to manage that risk.

Bottlenecks and trade-offs#

Every design decision is a trade-off. Acknowledging these trade-offs explicitly in an interview, rather than presenting your design as flawless, is one of the strongest signals of engineering maturity.

Hot keys and viral pastes#

A single paste going viral can generate thousands of reads per second to a single cache shard. Mitigation strategies (request coalescing, hot key replication, staggered TTLs) were discussed in the caching section, but the fundamental trade-off is complexity vs. resilience. Adding hot key detection and automatic replication adds operational complexity, but without it, a single viral paste can degrade the experience for all users on that shard.

Consistency vs. availability#

By choosing a NoSQL store with eventual consistency and relying on background TTL deletion, we accept that:

  • An expired paste may remain accessible for a short window (seconds to minutes) after its expires_at timestamp.
  • A paste created on the primary may not be immediately visible on read replicas.

For a paste-sharing service, these consistency relaxations are acceptable. Immediate deletion is not mission-critical, and a brief period of staleness does not violate any user trust or legal requirement. This is a deliberate lean toward availability in the CAP theorem spectrum.

Storage cost vs. read latency#

Storing all paste content in the high-speed metadata database would minimize read latency by eliminating the object storage network hop. But it would dramatically increase database costs, especially for large pastes that are rarely accessed. The tiered approach (inline for small pastes, S3 for large pastes) trades a modest latency increase on large-paste reads for significant cost savings at scale.

CDN cache staleness vs. freshness#

Aggressively caching at the CDN edge reduces latency globally but introduces the risk of serving stale content. If a user deletes a paste, the CDN may still serve it from its edge cache until the TTL expires or a purge request propagates. The trade-off is latency vs. freshness, and for a paste-sharing service, slightly stale reads on deletion are far more acceptable than the latency penalty of not using a CDN at all.

Key Trade-offs in Pastebin System Design

Trade-off

Option A

Option B

Recommended Choice

Consistency vs. Availability

Consistency – ensures data integrity but risks reduced availability during partitions

Availability – stays operational during issues but may serve stale data

Availability – users prefer uninterrupted access, aligning with the PACELC theorem

Cost vs. Latency

Lower Cost – reduced infrastructure expenses but slower response times

Lower Latency – faster performance but higher operational costs

Balanced – use distributed caching (e.g., Redis) to improve read speed without major cost increases

Complexity vs. Resilience

Simplicity – easier to build and maintain but lacks redundancy

Resilience – fault-tolerant via redundancy but increases architectural complexity

Resilience – microservices architecture isolates failures, preventing full system outages

Freshness vs. Edge Performance

Data Freshness – most up-to-date content but higher latency for distant users

Edge Performance – faster load times via CDN but may serve slightly outdated content

Edge Performance – CDN caching is acceptable since immediate freshness is less critical than speed

Understanding these trade-offs is exactly what interviewers are testing. Let us wrap up with how this problem is evaluated and what separates a passing answer from an exceptional one.

How this problem is evaluated in interviews#

The paste-sharing problem is a favorite among interviewers not because it is complex, but because it is deep. The simple surface allows them to probe your thinking at every layer. Here is what they are assessing:

  • Structured approach: Did you clarify requirements and define SLOs before drawing a single box? Candidates who jump straight to architecture often miss critical constraints.
  • Scalability mindset: Did you identify the read-heavy pattern early and build your caching, CDN, and replication strategy around it?
  • Component justification: Did you explain why you chose NoSQL over SQL, object storage over database blobs, or an offline KGS over online hashing? Decisions without rationale are red flags.
  • Trade-off articulation: Did you explicitly name the trade-offs (consistency vs. availability, cost vs. latency) and defend your chosen position?
  • Operational awareness: Did you mention monitoring, alerting, and failure scenarios? This separates candidates who have operated real systems from those who have only read about them.
Attention: A common mistake is spending ten minutes on the key generation algorithm and then rushing through caching, scaling, and observability. Key generation is important, but it is one component. The interviewer wants to see that you can design the whole system and articulate how each piece supports the SLOs you defined at the start.

A strong answer progresses from requirements to capacity estimation to architecture to deep dives on specific components (caching, key generation, storage) to trade-offs and observability. It flows like a conversation, not a memorized script.

Conclusion#

Designing a paste-sharing service is an exercise in disciplined simplicity. The core feature set is small, which means every design decision is exposed and must be defended. The most critical takeaway is that a read-heavy system with simple access patterns should be built around aggressive, multi-layer caching (application, distributed cache, CDN) backed by tiered storage that separates small, hot metadata from large, cold content. The second key insight is that concrete numbers matter: defining SLOs, estimating QPS, and sizing storage before choosing technologies ensures that your architecture is grounded in reality rather than intuition. Third, operational concerns like observability, TTL-based cleanup, and abuse prevention are not afterthoughts but are core components that determine whether the system survives its first week in production.

Looking ahead, paste-sharing systems are evolving toward richer collaboration features (real-time editing, version history, inline comments) that push them closer to lightweight document editors. The storage and caching patterns remain foundational, but the consistency requirements tighten as multi-user collaboration demands stronger ordering guarantees and conflict resolution, likely driving adoption of CRDTs and operational transformation techniques.

If you can design a system this “simple” with this level of depth, you can design far more complex systems using the same principles. That is the real point of the exercise.


Written By:
Zarish Khalid