Deliveroo System Design Explained

Table of Contents

Understanding the core problem Core functional requirements Non-functional requirements that drive complexity High-level architecture overview Restaurant discovery and availability What feeds into restaurant availability Caching strategy for discovery Order placement and validation The transactional pipeline Idempotency in order creation Real-time order state management The order state machine Event fan-out and consistency Rider matching and dispatch GPS ingestion pipeline The scoring function Multi-order batching Delivery tracking and live updates Location streaming architecture Interpolation and trust Notifications and user communication Asynchronous by design Handling failures and exceptions Failure taxonomy and recovery flows Scaling across cities and regions Operational isolation through partitioning ML pipelines for ETA and demand prediction How interviewers evaluate Deliveroo system design Conclusion

Home/

Blog/

System Design/

Deliveroo System Design Explained

Explore how Deliveroo matches orders, riders, and restaurants in real time. This deep dive breaks down dispatch logic, state management, scaling, and failure handling behind one of the toughest System Design problems.

Mar 11, 2026

Deliveroo System Design is the architectural challenge of coordinating three independent actors (customers, restaurants, and riders) through a real-time logistics platform that combines geospatial matching, event-driven state management, and fault-tolerant delivery orchestration under strict time pressure. It stands as one of the most comprehensive system design problems in consumer technology because it demands low-latency dispatch decisions, financial consistency, and graceful degradation across every layer of the stack.

Key takeaways

Real-time state management is the backbone: A central order state machine must propagate transitions to all actors reliably, using event streaming with idempotent consumers to handle mobile network failures.
Geospatial indexing drives discovery and dispatch: Structures like geohashes and QuadTrees enable sub-second restaurant lookups and rider matching across millions of concurrent location updates.
Dispatch is heuristic, not optimal: Rider assignment relies on scoring functions that weigh proximity, workload, and prep time because globally optimal matching is too slow for real-time food delivery.
Consistency varies by subsystem: Payment processing requires strong consistency while location tracking and notifications favor eventual consistency with clear user communication.
Failure is a primary design concern: Rider cancellations, GPS blackouts, and restaurant delays are expected events with dedicated recovery flows, not edge cases bolted on after the fact.

Every day, millions of people tap a button and expect a hot meal to arrive at their door in under 30 minutes. What they never see is the distributed system behind that button making hundreds of decisions per second: which restaurant is open, which rider is closest, whether the payment cleared, and how to reroute when a rider’s phone loses signal in a tunnel. Designing a system like Deliveroo is not about building a food ordering app. It is about orchestrating a real-time, city-scale logistics network where every minute of delay degrades the product.

This is precisely why Deliveroo appears so frequently in system design interviews. It forces you to reason about moving actors, time-sensitive workflows, partial failures, and human unpredictability, all at once. In this guide, we will walk through the full architecture of a Deliveroo-like system, layer by layer, with the technical depth that both interviews and production systems demand.

Understanding the core problem#

At its foundation, Deliveroo is a real-time three-sided marketplace. It connects customers who want food, restaurants that prepare it, and riders who deliver it. The platform’s job is to continuously answer a set of overlapping questions:

Which restaurants can fulfill this order right now, given location, capacity, and menu availability?
Which rider should deliver it, given proximity, current workload, and predicted prep time?
What is the fastest reliable path from kitchen to customer, accounting for traffic and GPS noise?

Unlike systems where requests can be queued or batched, Deliveroo operates under hard time constraints. An order that sits unassigned for five extra minutes does not just delay a response. It delivers cold food, erodes trust, and triggers a cascade of reassignments.

Real-world context: Deliveroo’s engineering blog notes that their architecture interview specifically tests whether candidates understand the difference between “theoretically correct” and “operationally viable” designs, reflecting the real tension between optimization and latency in production logistics systems.

This time-critical nature shapes every architectural decision we will explore. Before diving into subsystems, we need to define exactly what the system must do and, more importantly, what non-functional constraints make it hard.

Core functional requirements#

A clear requirements definition prevents scope creep during both interviews and real design sessions. The functional surface of a Deliveroo-like system spans three actor types, each with distinct workflows.

Customers must be able to browse nearby restaurants filtered by cuisine, rating, and delivery time. They place orders, pay securely, and track delivery progress in real time. Restaurants receive incoming orders, update preparation status through discrete stages, and signal when food is ready for pickup. Riders receive delivery task assignments, navigate to restaurants and then to customers, and confirm delivery completion.

More concretely, the system must support:

Restaurant discovery based on user location, time of day, and restaurant capacity
Order placement with atomic validation and payment processing
Real-time order state transitions visible to all three actor types
Rider assignment using geospatial and workload-aware matching
Push notifications and live location tracking throughout the delivery life cycle

Attention: These workflows are tightly coupled in time. A 30-second delay in rider assignment compounds into minutes of delivery delay because the restaurant continues preparing food that nobody is coming to pick up.

What makes Deliveroo architecturally interesting is not the feature list itself but the non-functional constraints that govern how these features must behave under pressure.

Non-functional requirements that drive complexity#

The real engineering challenge in a Deliveroo-like system lives in the non-functional requirements. These constraints determine the architecture far more than the feature set does.

The following table summarizes the key non-functional requirements and their architectural impact:

Non-Functional Requirements and Their Architectural Implications

Non-Functional Requirement	Definition	Architectural Strategies
High Concurrency	Ability to handle large numbers of simultaneous operations	Horizontal Scaling, Event-Driven Pipelines, Partitioning (Sharding)
Low Latency	Capability to respond to requests in minimal time	Caching Mechanisms, Data Compression, Connection Pooling
Fault Tolerance	Continued operation despite component failures	Replication & Redundancy, Retry/Reassignment Logic, Circuit Breaker Pattern
Regional Scalability	Ability to scale across geographical regions	City-Level Partitioning, Content Delivery Networks (CDNs), Multi-Region Deployments
Data Consistency	Ensuring all users see uniform, rule-compliant data	Strong/Eventual/Causal Consistency Models, Mixed Consistency Models, Distributed Caching

Concurrency spikes dramatically during peak meal times. A system serving a major city might handle 10,000+ orders per hour during a Friday dinner rush, with each order generating dozens of downstream events. The architecture must horizontally scale stateless services while carefully managing stateful components like the order state machine.

Latency in food delivery is measured in minutes, not milliseconds, but the compounding effect is severe. A dispatch decision that takes 5 seconds instead of 200 milliseconds means thousands of slightly delayed orders per hour. Over a dinner rush, this translates to measurably lower customer satisfaction.

Fault tolerance is not optional. Riders go offline mid-delivery. Restaurants reject orders after accepting them. Payment gateways time out. The system must treat these as expected events with well-defined recovery paths, not exceptional conditions that trigger manual intervention.

Pro tip: In interviews, explicitly stating that you treat failures as primary design concerns (not afterthoughts) signals production-level thinking. Describe the failure mode first, then the recovery mechanism.

Regional scalability adds another dimension. Each city has different peak hours, rider density, traffic patterns, and restaurant behavior. The architecture must isolate cities operationally through city-level partitioningA sharding strategy where data and compute resources are segmented by geographic region (typically a city), so that local disruptions like weather events or rider shortages do not cascade to the global platform. while sharing common infrastructure for cost efficiency.

With these constraints established, let us look at the high-level architecture that addresses them.

High-level architecture overview#

A Deliveroo-like system decomposes naturally into six major subsystems, each with distinct scaling, consistency, and latency profiles.

The following diagram captures how these subsystems interact:

The six subsystems are:

Restaurant Discovery Service for browsing, search, and availability checks
Order Service for placement, validation, and life cycle management
Rider Dispatch Engine for real-time matching and assignment
Payment Service for transaction processing and settlement
Order State Service as the central event-sourced source of truth
Notification Service for async communication to all actor types

Each subsystem communicates primarily through event streamingAn architectural pattern where state changes are published as immutable events to a distributed log (such as Apache Kafka), allowing multiple consumers to react independently and at their own pace. rather than synchronous RPC calls. This decoupling is essential because a slow notification service must never block order placement, and a GPS ingestion spike must never delay payment processing.

Historical note: Deliveroo’s early architecture was more monolithic. Their migration to event-driven microservices, documented on their engineering blog, reflects a common evolutionary pattern where monoliths are decomposed once operational pain exceeds organizational coordination costs.

Let us begin tracing the user journey through these subsystems, starting with how customers discover restaurants.

Restaurant discovery and availability#

The user journey begins with a deceptively simple screen: a list of nearby restaurants. Behind it is a read-heavy pipeline that must combine static catalog data with volatile real-time signals.

What feeds into restaurant availability#

To decide which restaurants to show a user, the system must evaluate multiple inputs simultaneously:

Static data: Restaurant location (latitude/longitude), menu items, cuisine tags, operating hours
Dynamic signals: Current order queue depth, average prep time over the last 30 minutes, whether the restaurant has paused accepting orders
Geospatial constraints: Whether the restaurant falls within a deliverable radius of the user, accounting for real travel time rather than straight-line distance

The geospatial lookup is the most performance-critical piece. The system must find all restaurants within a delivery radius (typically 3 to 5 km of real travel distance) for potentially millions of concurrent users. This requires a spatial index.

The two dominant approaches are geohashA hierarchical spatial encoding that maps latitude/longitude coordinates into a short alphanumeric string, where locations sharing a common prefix are geographically close, enabling fast proximity queries through simple string prefix matching. based indexing and tree-based structures like QuadTreeA tree data structure that recursively subdivides two-dimensional space into four quadrants, enabling efficient spatial range queries by pruning branches that fall outside the search area. or R-Trees. Geohashes are simpler to implement and work well with key-value stores. You encode each restaurant’s location into a geohash prefix, then find nearby restaurants by querying for matching prefixes at the desired precision level. QuadTrees offer better precision for irregularly shaped delivery zones but add implementation complexity.

Pro tip: In an interview, mention that straight-line (Haversine) distance is a useful first filter but insufficient for dispatch. A restaurant 2 km away by straight line might be 6 km by road. Production systems use a routing engine like OSRM or GraphHopper for real travel-time estimates, with cached results for common origin-destination pairs.

Caching strategy for discovery#

Restaurant availability changes on the order of minutes, not seconds. But discovery requests arrive thousands of times per second. This mismatch makes aggressive caching effective.

A practical approach is to precompute availability snapshots per geohash cell every 30 to 60 seconds. When a user opens the app, the system resolves their location to a geohash prefix, fetches the precomputed result set, and applies lightweight client-side filters (cuisine, rating, dietary preferences). Cache invalidation triggers when a restaurant pauses orders or changes its prep time beyond a threshold.

The key design constraint is ensuring users never place orders that cannot be fulfilled. A slightly stale cache that shows a restaurant as available when it has just closed is acceptable if the order placement step performs a real-time revalidation. This is a classic eventual consistency trade-off: serve fast reads, enforce correctness at write time.

Once a user selects a restaurant and builds their cart, the system enters its most transactionally sensitive phase.

Order placement and validation#

Order placement is the moment where browsing becomes a financial commitment. This is one of the few subsystems that demands strong consistency because incorrect charges, duplicate orders, or phantom reservations directly erode trust.

The transactional pipeline#

When a user submits an order, the system must execute a tightly coordinated sequence:

Re-validate availability: Confirm the restaurant is still accepting orders and all selected menu items are in stock. The cached data from discovery may be seconds old.
Calculate pricing: Apply item prices, delivery fees, promotions, and taxes. This must be deterministic and auditable.
Authorize payment: Place a hold on the customer’s payment method via an external gateway. This is an external call with unpredictable latency.
Create the order record: Persist the order with an initial state of PLACED in the Order State Service.
Notify the restaurant: Publish an event that triggers the restaurant’s order management interface.

Steps 1 through 4 must behave as a logical transaction. If payment authorization fails, no order record should exist. If the restaurant rejects mid-sequence, the payment hold must be released.

Attention: Distributed transactions across microservices are notoriously difficult. A common pattern here is the Saga pattern, where each step has a compensating action (e.g., release payment hold if restaurant rejects). This avoids two-phase commits while maintaining eventual correctness.

Pseudocode

SAGA PlaceOrder(orderRequest):
    // ── Step 1: Validate order ──────────────────────────────────────────
    validatedOrder = ValidateOrder(orderRequest)
    IF validation fails:
        RETURN Failure("Validation error")          // no compensation needed yet
    // ── Step 2: Price order ─────────────────────────────────────────────
    pricedOrder = PriceOrder(validatedOrder)
    IF pricing fails:
        RETURN Failure("Pricing error")             // no side-effects to undo
    // ── Step 3: Authorize payment (creates a hold on customer funds) ─────
    paymentHold = AuthorizePayment(pricedOrder)
    IF authorization fails:
        RETURN Failure("Payment authorization failed")
    // ── Step 4: Persist order to database ───────────────────────────────
    savedOrder = PersistOrder(pricedOrder)
    IF persist fails:
        ReleasePaymentHold(paymentHold)             // compensate: undo hold
        RETURN Failure("Order persistence failed")
    // ── Step 5: Notify restaurant ────────────────────────────────────────
    notified = NotifyRestaurant(savedOrder)
    IF notification fails:
        DeleteOrder(savedOrder)                     // compensate: remove persisted order
        ReleasePaymentHold(paymentHold)             // compensate: undo hold
        RETURN Failure("Restaurant notification failed")
    RETURN Success(savedOrder)
// ── Compensating actions ─────────────────────────────────────────────────
FUNCTION ReleasePaymentHold(paymentHold):
    // Reverse the payment authorization; restore funds to customer account
    PaymentService.release(paymentHold.holdId)
FUNCTION DeleteOrder(savedOrder):
    // Remove the persisted order record so no orphan data remains
    OrderRepository.delete(savedOrder.orderId)

Idempotency in order creation#

Mobile networks are unreliable. Users tap “Place Order” and see a spinner. They tap again. The system must ensure that duplicate submissions do not create duplicate orders or double-charge the customer.

This requires idempotencyThe property ensuring that performing the same operation multiple times produces the same result as performing it once, typically enforced by associating each request with a unique client-generated key that the server checks before processing. at the API layer. The client generates a unique idempotency key before submission. The server checks this key against recent requests and returns the existing result if a match is found, rather than processing a new order.

With the order created and payment authorized, the system must now manage its progression through a series of well-defined states.

Real-time order state management#

After placement, an order moves through a life cycle: PLACED → ACCEPTED → PREPARING → READY_FOR_PICKUP → PICKED_UP → DELIVERED. Each transition is triggered by a different actor, and all three actors must observe a consistent view of the current state.

The order state machine#

The Order State Service acts as the single source of truth. It enforces valid transitions (you cannot move from PLACED to PICKED_UP without passing through PREPARING), timestamps every change, and publishes events for each transition.

This service must handle high write throughput during peak hours. Each active order generates multiple state transitions within a 20-to-40-minute window. With tens of thousands of concurrent orders per city, the state service may process hundreds of thousands of writes per hour per region.

Event fan-out and consistency#

Each state transition triggers downstream effects. The customer’s app updates. The rider receives new instructions. Analytics pipelines record timing data. The notification service sends push alerts.

This fan-out is handled through an event bus, typically Apache Kafka or a similar distributed log. Each state change is published as an immutable event. Downstream consumers (notification service, tracking service, analytics) subscribe to the relevant topics and process events independently.

Real-world context: Kafka’s partitioning model maps naturally to order state management. Partitioning by order ID ensures that all events for a single order are processed in sequence by a single consumer, preventing out-of-order state transitions while allowing parallel processing across orders.

The critical design choice here is that the state service itself is strongly consistent (reads after writes see the latest state), but downstream consumers are eventually consistent. A customer might see the “Preparing” notification a second or two after the restaurant actually tapped the button. This is acceptable because the alternative, synchronous fan-out, would make the state service a bottleneck.

The most architecturally complex consumer of these state events is the dispatch engine, which must assign a rider at precisely the right moment.

Rider matching and dispatch#

Rider dispatch is where Deliveroo’s system design becomes genuinely hard. The system must select the best available rider for an order that is approaching readiness, using noisy, incomplete, real-time data, and it must do so in milliseconds.

GPS ingestion pipeline#

The dispatch engine depends on a continuous stream of rider location updates. Every active rider’s phone sends GPS coordinates every 3 to 5 seconds. For a city with 5,000 active riders, that is roughly 1,000 to 1,700 location updates per second, per city.

These updates flow through a dedicated ingestion pipeline into a geospatial index. The index must support two operations efficiently:

Range query: Find all available riders within X km of a given restaurant
Update: Move a rider’s position in the index as new GPS data arrives

A geohash-based index stored in a fast key-value store (like Redis) is a common choice. Each rider’s current geohash cell is updated on every location ping. Range queries become prefix scans over adjacent geohash cells.

Attention: GPS data from mobile phones is inherently noisy. Urban canyons, tunnels, and poor satellite visibility can produce location errors of 50 to 200 meters. The ingestion pipeline should apply a Kalman filter or simple moving average to smooth coordinates before indexing. Dispatching a rider based on a single noisy GPS fix leads to poor assignment decisions.

The scoring function#

When an order approaches readiness (typically triggered by the PREPARING state and an estimated prep time), the dispatch engine queries for nearby available riders and scores each candidate. The scoring function balances multiple factors:

$$S{rider} = w1 \\cdot \\frac{1}{d{travel}} + w2 \\cdot \\frac{1}{load + 1} + w3 \\cdot reliability + w4 \\cdot \\frac{1}{|ETA{rider} - ETA{food}|}$$

Where $d{travel}$ is estimated travel time to the restaurant (not straight-line distance), $load$ is the rider’s current number of active deliveries, $reliability$ is a historical completion rate, and the final term penalizes mismatches between rider arrival time and food readiness time. The weights $w1$ through $w_4$ are tuned per city based on historical data.

This is a heuristic scoring functionA fast, rule-based or weighted formula that produces a "good enough" ranking of candidates without exhaustively searching the entire solution space, trading global optimality for the speed required in real-time decision-making. rather than a globally optimal assignment. Global optimization (e.g., solving a bipartite matching problem across all pending orders and all available riders) would produce better aggregate outcomes but is computationally expensive and introduces latency that defeats its purpose.

Multi-order batching#

In dense urban areas during peak hours, a rider can often pick up two orders from the same restaurant or from restaurants within walking distance. This is called order stacking or multi-order batching.

The dispatch engine must evaluate whether assigning a second order to an already-en-route rider results in acceptable delivery times for both orders. The trade-off is clear: batching improves rider utilization and reduces cost per delivery, but it risks delaying the first customer’s order and delivering lukewarm food.

Single-Order Dispatch vs. Multi-Order Batching: A Comparative Analysis

Dimension	Single-Order Dispatch	Multi-Order Batching
Rider Utilization	Lower utilization; more idle time between deliveries	Higher utilization; batching can reduce driver count by 36–60%
Average Delivery Time	Longer due to limited route optimization	Shorter; optimized bundling can reduce delay from 35 to 10 minutes
Food Quality Risk	Lower risk; minimal time in transit per order	Moderate risk; mitigated by temperature control and efficient routing
System Complexity	Simple logistics; easier to manage	Higher complexity; requires advanced algorithms for optimal grouping
Customer Satisfaction Predictability	More predictable delivery times	Slight variability; well-managed batching results in only ~1% more late orders

Production systems typically cap batching at two orders per rider and impose a maximum detour time (e.g., 5 minutes) to protect the first customer’s experience.

Pro tip: When discussing dispatch in an interview, explicitly mention that you favor a heuristic approach over global optimization, and explain why. Interviewers want to see that you understand the latency vs. optimality trade-off and can make pragmatic engineering decisions under constraints.

Once a rider is assigned and accepts the delivery, the system shifts into a different operational mode: continuous tracking.

Delivery tracking and live updates#

After a rider accepts a delivery task, the customer expects to see a moving dot on a map. This feature seems simple but involves a pipeline that must tolerate noisy data, unreliable connections, and variable update rates.

Location streaming architecture#

The rider’s phone continues sending GPS updates, but now these updates serve a different purpose: they must be fanned out to the specific customer (and potentially the restaurant) tracking this delivery. This is a classic pub/sub pattern with a twist. Each delivery creates a temporary “channel” that exists only for the duration of that delivery.

The system uses WebSocket connections for real-time delivery to the customer’s app. However, mobile WebSocket connections drop frequently (subway, elevators, poor signal). The client must handle reconnection gracefully and request the latest known position on reconnect rather than assuming continuity.

Interpolation and trust#

Raw GPS updates arrive irregularly and with noise. If the system simply plots each raw coordinate, the rider’s dot on the customer’s map would jitter erratically and occasionally teleport. Instead, a smoothing layer interpolates between known positions, showing a steady progression along the expected route.

The key insight is that perceived reliability matters more than raw accuracy. A smoothly moving dot that is 50 meters off from the rider’s true position builds more trust than a precise but jerky dot that jumps around the map. This is a conscious design choice: optimize for user perception, not GPS precision.

Real-world context: Uber’s engineering team has published extensively on this interpolation problem. The approach involves snapping GPS coordinates to the road network (map matching) and then animating movement along the road segment, which produces a far more natural visual experience than plotting raw coordinates.

Tracking data also feeds back into the dispatch system for ETA recalculation. If a rider is stuck in traffic, the system can proactively notify the customer of a delay rather than letting the original ETA silently expire. This proactive communication is handled by the notification layer.

Notifications and user communication#

Notifications are the system’s voice. They tell customers their order was accepted, their food is being prepared, their rider is arriving. They tell riders about new assignments. They tell restaurants about incoming orders. Getting notifications wrong (too many, too few, duplicated, or missing) directly damages the user experience.

Asynchronous by design#

Notifications are handled as a fully asynchronous, event-driven subsystem. The notification service subscribes to order state transition events from Kafka and determines which notifications to send based on the transition type and recipient preferences.

This decoupling is critical. If the push notification provider (APNs for iOS, FCM for Android) experiences elevated latency, the notification service’s internal queue absorbs the backlog without affecting order processing. BackpressureA flow control mechanism in streaming systems where a slow consumer signals upstream producers to reduce their emission rate, preventing buffer overflow and maintaining system stability. mechanisms prevent the notification queue from growing unbounded during provider outages.

The service must also handle deduplication. If a state transition event is delivered twice (which Kafka’s at-least-once delivery guarantees allow), the notification service must recognize the duplicate and suppress the second notification. This is typically implemented by tracking a hash of (orderid, transitiontype, timestamp) in a short-lived deduplication cache.

Attention: Notification fatigue is a real product concern. Sending a push notification for every micro-state change (“rider is 1.2 km away,” “rider is 0.8 km away”) annoys users. The system should batch location-based updates into meaningful milestones: “rider is on the way,” “rider is nearby,” “rider has arrived.”

The notification layer is the last component in the “happy path.” But real-world delivery systems spend as much engineering effort on the unhappy path.

Handling failures and exceptions#

In a system with moving human actors, unreliable mobile networks, and external dependencies (payment gateways, routing APIs, push notification providers), failures are not edge cases. They are the steady state.

Failure taxonomy and recovery flows#

The system must handle several categories of failure, each with a different recovery mechanism:

Rider cancellation or no-show: The dispatch engine maintains a reassignment queue. If a rider does not confirm pickup within a timeout window, the order is automatically returned to the dispatch pool with elevated priority and a wider search radius.
Restaurant rejection after acceptance: The order moves to a REJECTED state. The customer is notified immediately, the payment hold is released, and the system may suggest alternative restaurants with similar cuisine and delivery time.
GPS blackout: If rider location updates stop for more than 60 seconds during an active delivery, the system falls back to the last known position and estimated route. It does not reassign the rider immediately (they might just be in a tunnel) but escalates to support if the blackout exceeds a configurable threshold.
Payment gateway timeout: The order placement saga’s compensating action kicks in. The system retries with exponential backoff and, if retries are exhausted, surfaces a clear error to the user without creating a phantom order.

Historical note: Early food delivery platforms often treated rider cancellations as exceptional, requiring manual support intervention. Modern systems like Deliveroo automate reassignment entirely, reducing the median time to reassign from minutes (with human support agents) to seconds (with automated dispatch).

The unifying principle is graceful degradation. The system may not deliver a perfect experience during failures, but it must always deliver a predictable, communicative one. A customer who is told “your rider had to cancel, we’re finding a new one” trusts the platform far more than a customer who watches a frozen ETA slowly become obviously wrong.

Failures become even harder to manage when the system operates across multiple cities with different characteristics.

Scaling across cities and regions#

Deliveroo operates in hundreds of cities across multiple countries. Each city is essentially a semi-independent logistics network with its own peak hours, rider supply, restaurant density, and traffic patterns.

Operational isolation through partitioning#

The architecture uses city-level partitioning as a primary concept. Each city’s orders, rider locations, and dispatch state are isolated into separate partitions (separate Kafka topics, separate database shards, separate cache namespaces). This isolation ensures that a surge in London does not affect dispatch latency in Paris.

Common infrastructure, including the user authentication service, payment gateway integrations, menu catalog, and analytics pipelines, is shared globally. But the real-time, latency-sensitive components (dispatch, state management, tracking) are partitioned by city.

This also enables independent deployments and configuration. Dispatch weights ($w1$ through $w4$ in the scoring function) can be tuned per city. Batching thresholds can vary based on local rider density. Peak hour scaling policies can differ because London’s dinner rush does not coincide with Dubai’s.

Pro tip: In a system design interview, framing regional scaling as “operational blast radius containment” demonstrates that you think about failure domains, not just throughput. Interviewers value this perspective because it reflects how production infrastructure teams actually reason about multi-region systems.

ML pipelines for ETA and demand prediction#

Production systems layer machine learning models on top of the core architecture for two key predictions:

ETA estimation: Given a restaurant’s current queue depth, historical prep times, rider travel time, and current traffic conditions, predict when the food will arrive at the customer’s door.
Demand forecasting: Predict order volume per city per time window to pre-position riders in high-demand areas and alert restaurants to prepare for surges.

These models consume features from the event stream (order volumes, rider GPS traces, restaurant acknowledgment times) and serve predictions through a low-latency inference endpoint. Model versioningThe practice of tagging, storing, and managing multiple versions of a trained ML model so that the system can roll back to a previous version if a new deployment degrades prediction quality in production. is essential because a poorly performing ETA model deployed globally could simultaneously degrade the experience for millions of users.

The models are typically trained offline on historical data and deployed via a canary process (new model serves 5% of traffic, metrics are compared, then gradually rolled out). Fallback logic ensures that if the ML inference endpoint is slow or unavailable, the system reverts to a simpler heuristic (e.g., average historical delivery time for this restaurant).

With the full architecture laid out, let us consider how interviewers evaluate your understanding of these systems.

How interviewers evaluate Deliveroo system design#

Interviewers use Deliveroo as a system design question not to test whether you can recite component names, but to assess how you reason about trade-offs under real-world constraints.

They evaluate several dimensions:

Requirements scoping: Did you clarify what “real-time” means? Did you ask about scale (how many cities, how many concurrent orders)?
Architectural decomposition: Did you separate concerns cleanly? Did you explain why certain subsystems need strong consistency while others tolerate eventual consistency?
Dispatch depth: Did you address the rider matching problem with specificity, mentioning scoring functions, geospatial indexing, and the trade-off between heuristic speed and optimal assignment?
Failure reasoning: Did you proactively raise failure scenarios and describe recovery mechanisms, or did you only discuss the happy path?
Communication clarity: Did you articulate trade-offs in terms of their user impact, not just their technical properties?

Real-world context: Deliveroo’s own interview guide emphasizes that they look for candidates who can “navigate ambiguity” and “make reasonable assumptions.” The strongest candidates do not try to design a perfect system. They design a system that works, explain where it is weak, and describe how they would iterate.

Interview Response Evaluation Rubric

Dimension	Weak	Acceptable	Strong
Requirements Definition	Rushes to solutions without clarifying requirements; fails to ask key questions	Asks some clarifying questions but misses key areas; reasonable yet incomplete understanding	Thoroughly clarifies requirements; asks insightful questions uncovering constraints, priorities, and context
Dispatch Design	Lacks structure; ignores scalability and flexibility in task allocation	Structured design addressing basics but misses edge cases and scalability concerns	Well-structured, efficient design optimizing task allocation; accounts for scalability, flexibility, and failure modes
Failure Handling	Ignores potential failures; proposes no detection or recovery mechanisms	Acknowledges failures and suggests basic detection/recovery; misses edge cases and proactive measures	Proactively identifies failures; designs comprehensive detection, recovery, fault tolerance, and monitoring strategies
Trade-off Articulation	Proposes solutions without considering trade-offs; ignores design implications	Mentions trade-offs but provides limited analysis; superficial evaluation of pros and cons	Clearly articulates multiple design options; thoroughly analyzes trade-offs across performance, cost, and complexity

The difference between a passing and an outstanding answer often comes down to whether you discuss the system as a living, failing, evolving thing, or as a static diagram on a whiteboard.

Conclusion#

Deliveroo System Design is one of the richest problems in modern distributed systems because it sits at the intersection of real-time decision-making, geospatial computation, financial transactions, and human unpredictability. The most important architectural insight is not any single component but the recognition that different parts of the system demand fundamentally different consistency, latency, and reliability guarantees, and that the boundaries between these zones must be drawn deliberately. Strong consistency for payments, eventual consistency for tracking, heuristic speed for dispatch, and graceful degradation everywhere: these are the trade-offs that define a production-grade delivery platform.

Looking ahead, the next wave of innovation in systems like Deliveroo will likely involve tighter integration of ML-driven demand prediction with real-time dispatch, autonomous delivery vehicles introducing new actor types into the state machine, and increasingly sophisticated multi-order batching algorithms that approach near-optimal assignment without sacrificing latency. The foundational architecture described here, event-driven, partitioned by city, and designed for failure, provides the scaffold on which those innovations will be built.

If you can walk an interviewer from a customer tapping “Place Order” through geohash lookups, saga-based payment flows, heuristic rider scoring, Kafka-powered state fan-out, and automated failure recovery, you are not just answering a system design question. You are demonstrating the kind of end-to-end architectural thinking that builds systems people trust with their Tuesday night dinner.

Written By:

Mishayl Hanan

Free Resources

blog

Amazon System Design Interview Questions

blog

The top 6 system design interview mistakes to avoid

blog

What is Redis? Get started with data types, commands, and more