Chat System Design

Chat System Design

Build chat as store + sync: persist ordered messages, deliver at-least-once with dedupe, reconcile devices with per-device cursors, pick the right group fan-out strategy, treat push as a wake-up hint, and measure latency, lag, and sync convergence.

19 mins read
Jan 22, 2026
Share
editor-page-cover

Real-time chat looks simple from the UI, but it’s one of the most concentrated System Design interview problems you’ll face. A chat system forces you to deal with persistent connections, low-latency fan-out, ordered and durable delivery, multi-device synchronization, and “soft real-time” features like presence and typing that can’t justify strong consistency at global scale.

Interviewers like chat design because the hard parts are interwoven. Your connection layer influences delivery semantics. Your data model influences ordering and sync performance. Your fan-out strategy determines whether group chat is cheap or catastrophically expensive. A solid answer demonstrates that you can keep guarantees crisp (what you promise) while making the architecture pragmatic (what you actually build).

This blog walks through a Staff-level, interview-friendly design: you’ll see how to define guarantees, pick an architecture, add a message delivery state machine, handle multi-device sync, choose group fan-out strategies, treat push notifications correctly, and instrument the whole system with metrics that prove it works.

Cover
Grokking Modern System Design Interview

System Design Interviews decide your level and compensation at top tech companies. To succeed, you must design scalable systems, justify trade-offs, and explain decisions under time pressure. Most candidates struggle because they lack a repeatable method. Built by FAANG engineers, this is the definitive System Design Interview course. You will master distributed systems building blocks: databases, caches, load balancers, messaging, microservices, sharding, replication, and consistency, and learn the patterns behind web-scale architectures. Using the RESHADED framework, you will translate open-ended system design problems into precise requirements, explicit constraints, and success metrics, then design modular, reliable solutions. Full Mock Interview practice builds fluency and timing. By the end, you will discuss architectures with Staff-level clarity, tackle unseen questions with confidence, and stand out in System Design Interviews at leading companies.

26hrs
Intermediate
5 Playgrounds
26 Quizzes

Interviewer tip: Chat is a “delivery + sync” system, not a “send over WebSocket” system. If your design can’t reconcile state after disconnects, it isn’t reliable.

Clarify requirements and set the right guarantees#

Start by narrowing scope and stating guarantees. Chat interviews go sideways when candidates list features without defining what’s durable, what’s ordered, and what’s best-effort. A strong opening frames the system around messaging reliability: messages should be durable and eventually delivered, ordering should be per conversation, and transient UI signals like presence/typing should be eventually consistent.

For functional scope, you can assume direct messages and group chats, message history, delivery/read receipts, presence, typing indicators, and basic attachments (at least at the metadata level). For a first pass, keep the payload as “text + metadata,” while leaving room for media via an object store.

The most important part is to state your guarantees explicitly. The interview-winning stance is: at-least-once delivery (with deduplication), per-conversation ordering, and eventual consistency for presence/typing. Those are realistic and defendable at scale.

Capability

What you guarantee

What you do not guarantee (and why)

Message delivery

At-least-once to each device, with dedupe

Exactly-once (too costly; crashes cause ambiguity)

Ordering

Per conversation ordering

Global ordering (unnecessary and impractical)

Presence/typing

Eventually consistent, best-effort

Strong consistency (too expensive; not worth the latency)

History

Durable storage + sync by cursor

“Never missing” without reconnect sync (networks disconnect)

Common pitfall: Promising “real-time” without defining a sync mechanism. WebSockets help latency, but sync defines correctness.

Summary (after the explanation):

  • State guarantees early: at-least-once, per-conversation ordering, presence/typing eventual consistency.

  • Keep scope realistic: direct + group, history, receipts, presence, typing, notifications.

  • Treat reconnect sync as part of delivery, not an optional add-on.

High-level architecture: separate connections, routing, and durability#

A scalable chat architecture separates the connection-heavy edge from the durable messaging core. The edge layer maintains millions of persistent client connections (WebSocket is typical; mobile may also use platform sockets). The core layer validates, sequences, persists, and routes messages. This separation keeps your messaging logic stateless and horizontally scalable while allowing the gateway tier to optimize for long-lived connections.

A common structure is: Clients → Gateway/Connection Manager → Messaging Service → Storage + Queue, with a Presence Service, a Notification Service, and caches for quick lookups. The messaging service is where you assign conversation sequence numbers, persist the message, and publish delivery tasks. The gateway is where you deliver to online devices by pushing over existing connections.

For reliability, add a durable queue (or log) between “message accepted” and “message delivered to devices.” This is what lets you survive gateway crashes, worker restarts, and recipient offline periods without losing messages. Think of it as the internal “delivery pipeline” that you can replay.

Client

Send, ack, maintain cursors, render state

Holds per-device truth and drives sync

Gateway / Connection Manager

Maintain WebSocket connections; push to devices

Optimized for long-lived connections

Messaging Service

AuthZ, sequencing, persistence, publish delivery tasks

Enforces guarantees and ordering

Delivery workers

Fan-out to online devices; track acks; retry

Converts durable tasks into pushes

Message store

Conversation-partitioned durable history

Efficient history reads and ordered queries

Presence service

Track online/last-seen/typing (soft state)

Eventually consistent UX

Notification service

Request APNS/FCM notifications for offline devices

Wake-up hint, not delivery

Interviewer tip: Name your “source of truth” explicitly. In chat, message history + per-device cursors define correctness. Gateways are not your source of truth.

Summary (after the explanation):

  • Gateways handle connections; messaging core handles correctness and persistence.

  • Use a durable queue/log for delivery tasks and replay.

  • Treat presence/typing as soft state, separate from message durability.

Core data model: optimize for conversation history and delivery tracking#

Chat workloads are dominated by two access patterns: “append messages to a conversation” and “read messages in a conversation by order.” That pushes you toward a partition key of conversation_id and a clustering key that preserves order (sequence_number or time-based ordering with a tie-breaker). This is why distributed NoSQL stores like Cassandra/DynamoDB/HBase are often chosen: they’re great at append-heavy, partitioned reads.

Beyond message storage, you need to model membership and per-user/per-device state. Delivery/read receipts at scale are tricky: storing per-message per-recipient status becomes expensive in large groups. Many systems use a hybrid: per-message delivery state for small conversations, and per-user “last_read_seq” pointers for larger ones (so the UI derives read states relative to that pointer).

You also need a consistent way to dedupe duplicates under at-least-once delivery. That typically means a message_id generated client-side (or by the server) that remains stable across retries, plus a dedupe table keyed by (conversation_id, message_id) or (sender_id, client_msg_id).

Messages

(conversation_id, seq)

message_id, sender_id, timestamp, payload_ref

Ordered history retrieval

Conversations

conversation_id

type, created_at, metadata

Conversation identity

Membership

(conversation_id, user_id)

role, joined_at, last_read_seq

AuthZ + read pointers

Device sessions

device_id

user_id, gateway_id, connected_at

Routing online devices

Dedupe

(conversation_id, message_id)

first_seen_at

Idempotency for retries

Common pitfall: Modeling read receipts as “status per message per user” for all group sizes. That explodes in storage and write load for large groups.

Summary (after the explanation):

  • Partition messages by conversation_id and order by seq for efficient history reads.

  • Track membership and last_read_seq to scale receipts in groups.

  • Add a dedupe strategy keyed by stable message identifiers.

Message delivery flow: ordering, durability, and at-least-once semantics#

A strong chat answer describes the send path as a series of durable steps. The client sends a message with a stable client-generated message_id (or idempotency key). The messaging service authenticates and authorizes the sender, assigns the next sequence number for the conversation (this is your ordering point), persists the message, and emits delivery tasks for recipients’ devices.

You should be explicit about where ordering is decided. In most designs, ordering is assigned server-side when the message is accepted. That means you either need a per-conversation sequencer (logical) or a storage operation that can produce monotonic ordering. In practice, you can implement sequencing as: a per-conversation counter in a strongly consistent store, a lightweight sequencer service partitioned by conversation_id, or optimistic assignment with conflict resolution (harder). In interviews, it’s usually enough to say “a sequencer partitioned by conversation_id assigns seq.”

At-least-once delivery means devices may see duplicates. You prevent duplicates by having devices dedupe by message_id (or seq) and by having the server dedupe repeated sends. Your delivery pipeline should be retryable: if pushing to a gateway fails, a worker can retry later without losing the message.

Accept message

Dedupe record + message envelope

Makes retries idempotent

Assign ordering

conversation seq

Guarantees per-conversation order

Store message

Messages(conversation_id, seq)

Durable history and sync source

Publish delivery

Delivery tasks per recipient/device

Separates durability from online push

Ack processing

Update per-device delivered cursor

Lets receipts and sync converge

What great answers sound like: “I assign ordering at the server, persist before fan-out, and treat delivery as a retryable pipeline. Dedupe is built in because at-least-once is inevitable.”

Summary (after the explanation):

  • Server assigns per-conversation ordering via sequence numbers.

  • Persist message before fan-out to ensure durability.

  • Use at-least-once delivery with dedupe by message_id/seq.

  • Treat delivery as retryable tasks, not a single WebSocket write.

Message delivery state machine: make delivery and receipts explicit#

A message system becomes interview-grade when you can describe its state transitions. Chat delivery isn’t one state; it’s a lifecycle. You need states for “accepted,” “delivered to device,” and “read by user,” and you need to say what’s persisted at each step so the system can recover after crashes.

One practical approach is to separate message state (durable in message store) from per-recipient/per-device progress (durable as cursors). For example: store the message once, then store per device “last_delivered_seq” and per user “last_read_seq.” Delivery receipts become cursor updates, not per-message status updates. For small 1:1 chats you may also store per-message status for simplicity, but the cursor model is the scalable baseline.

Below is a state machine that describes delivery from the system’s perspective, with persisted evidence at each step.

ACCEPTED

Message is validated and ordered

Message row + dedupe record

Messaging service commits

ENQUEUED

Delivery tasks exist

Delivery task records/log entries

Fan-out publish succeeds

DELIVERED_TO_DEVICE

Device received over connection or via sync

device_cursor.last_delivered_seq update

Device ack or sync confirmation

READ_BY_USER

User viewed the message

membership.last_read_seq update

Client read receipt

EXPIRED/FAILED

Delivery abandoned (rare; policy-driven)

Failure record + alert

TTL exceeded / blocked user

Interviewer tip: Receipts are state reconciliation. If you can’t explain how cursors converge after disconnects, your “read/delivered” indicators will be wrong under real network conditions.

Summary (after the explanation):

  • Use a state machine to explain delivery and receipts clearly.

  • Persist evidence (message row, tasks, cursor updates) so crashes don’t break correctness.

  • Prefer cursor-based receipts for scalability, especially in groups.

Walkthrough 1: online → online message with delivery and read receipts#

Consider a 1:1 chat where both users are online on two devices each. The sender’s client submits message_id and payload to the gateway, which forwards it to the messaging service. The service dedupes, assigns the next sequence number for the conversation, and persists the message. It then emits delivery tasks for each recipient device currently connected.

Delivery workers push the message over the recipient devices’ gateway connections. Each device dedupes by message_id/seq, renders the message, and sends an acknowledgment indicating the highest sequence number delivered for that conversation. The server updates each device’s delivered cursor. When the recipient opens the conversation view, the client sends a read receipt (often as “last_read_seq”), and the server updates membership state.

The sender’s UI can now derive receipt states. If you’re using cursor-based receipts, the sender sees “delivered” when recipient’s delivered cursor ≥ message seq and “read” when last_read_seq ≥ message seq. This avoids per-message status writes and scales cleanly.

Common pitfall: Treating read receipts as strongly consistent. In reality, they’re best-effort and may arrive out of order; using cursors makes reconciliation simpler.

Summary (after the explanation):

  • Persist, then fan-out to online devices.

  • Devices ack delivered via per-device delivered cursors.

  • Read receipts update per-user last_read_seq.

  • Sender derives receipts by comparing seq to recipient cursors.

Multi-device sync and state tracking#

Multi-device sync is where chat designs either become robust or collapse into hand-wavy promises. Users log in from multiple devices, disconnect frequently, and expect history, receipts, and unread counts to converge correctly. The key concept is per-device cursors: each device tracks what it has seen, and the server stores enough per-device state to reconcile after reconnects.

A practical sync model is cursor-based pull with server push as an optimization. When a device connects (or reconnects), it sends the last_delivered_seq per conversation (or a global cursor if you model differently). The server responds with missing messages in order, and the device updates its cursor. This ensures offline delivery even if push notifications fail, gateways crash, or connections flap.

You also need to reconcile delivered vs read across devices. A user-level last_read_seq is typically shared across devices (reading on phone marks read on desktop). Device-level delivered cursors remain per device (a laptop might not have synced yet). This separation lets your UI represent “delivered to one of user’s devices” versus “delivered to all devices,” depending on product requirements.

device_id

Device sessions store

Stable identity for routing and cursors

gateway_id / connection_id

Presence/session cache

Push routing for online devices

last_delivered_seq per conversation

Device cursor store

Determines what to sync on reconnect

last_ack_time

Cursor store

Detect stuck devices; drive cleanup

app_version/capabilities

Device profile

Backward compatibility for payload/features

notification_token

Notification store

APNS/FCM targeting for wake-ups

What interviewers look for in sync answers: “I want to hear per-device cursors, a reconnect sync flow that doesn’t rely on notifications, and a clear reconciliation strategy for delivered vs read across devices.”

Summary (after the explanation):

  • Use per-device cursors to drive reconnect sync and dedupe.

  • Treat push as optimization; sync is the correctness mechanism.

  • Separate device-delivered from user-read state for clean reconciliation.

Walkthrough 2: online → offline message with notification and reconnect sync#

Now consider the recipient is offline (no active gateway connection). The send path is the same through persistence: the message is deduped, sequenced, and stored. The difference is delivery: the server can’t push over WebSocket, so it records that the recipient’s devices have not advanced their delivered cursors and it triggers a push notification request.

The notification payload should be minimal: maybe conversation_id, sender display name, and a hint like “new message.” You avoid stuffing the full message into the push payload because of platform constraints and because it undermines end-to-end encryption designs. The push’s job is to wake the app so it can reconnect and sync.

When the recipient comes online (either by tapping the notification or by background refresh), the device establishes a connection, sends its last_delivered_seq per conversation, and the server streams missing messages in order. Only after the device confirms receipt does the server advance delivered cursors and generate delivery receipts back to the sender if the product supports it.

Interviewer tip: Your offline story should still work if notifications are delayed, dropped, or disabled. If it doesn’t, you’ve built a notification system, not a chat system.

Summary (after the explanation):

  • Persist first, then treat offline delivery as “sync later.”

  • Send a minimal push as a wake-up hint.

  • On reconnect, sync from per-device cursor to catch up reliably.

  • Advance delivered/read state based on sync acknowledgments, not notifications.

Group chat fan-out strategies#

Group chat changes the economics of delivery. In 1:1, you deliver to a small set of devices. In a large group, a single message can require thousands of deliveries, and naïve per-recipient writes will crush your database or queue. This is where interviewers want you to talk about fan-out strategies: fan-out-on-write, fan-out-on-read, and hybrid.

Fan-out-on-write means you expand the message into per-recipient inbox entries at send time. Reads are fast because each user reads their inbox, but writes become expensive for large groups. Fan-out-on-read means you store the message once in the conversation log and users fetch it when they read; this keeps writes cheap but makes reads and “unread counts” more complex. Hybrid approaches store the message once but maintain per-user cursors and selectively precompute inbox entries for small groups or for online users.

There isn’t one correct choice. A strong answer ties the choice to product realities: group size distributions, online/offline rates, unread count UX, and infrastructure cost constraints.

Fan-out-on-write

High write cost, low read cost

Low for reads

Medium

Small/medium groups, inbox-centric UX

Fan-out-on-read

Low write cost, higher read work

Depends on read path

High

Very large groups, log-centric storage

Hybrid

Balanced

Low for common cases

High

Mixed workloads, needs careful tuning

Common pitfall: Using fan-out-on-write for all groups. It looks simple until a “hot group” creates a write amplification incident.

Walkthrough: a “hot group” scenario and mitigations#

Imagine a group with 50,000 members and rapid message volume. If you fan-out-on-write, every message becomes 50,000 inbox writes plus deliveries, which can overwhelm storage and enqueue systems. Even if your workers can push to online users, the durable per-recipient writes will dominate cost and latency.

With fan-out-on-read or hybrid, you store each message once in the conversation log and update a compact set of state: per-user last_read_seq and per-device delivered cursors. Online members can receive pushed messages without permanently writing per-recipient inbox rows. For offline members, you rely on sync: they fetch from the log when they reconnect. Mitigations also include splitting “delivery” from “inbox computation,” rate limiting message sends for extreme groups, using tiered infrastructure for large channels, and caching recent segments of the log.

Interviewer tip: For hot groups, say the quiet part out loud: “I’m optimizing for storage efficiency and avoiding per-recipient write amplification; sync and cursors carry the correctness.”

Summary (after the explanation):

  • Compare fan-out strategies using cost and group-size realities.

  • Hot groups require avoiding per-recipient writes on every message.

  • Hybrid models commonly push to online users while keeping storage log-centric.

Push notifications are a hint, not delivery#

Push notifications (APNS/FCM) are important, but they are not a delivery mechanism you control. They can be delayed, collapsed, throttled, or disabled by user settings and OS policies. In interviews, treating push as “delivery” is a correctness flaw. The correct model is: durable storage + sync is delivery, push is a wake-up hint to reduce perceived latency.

You should mention platform constraints. Notifications have payload size limits, and you often want to use collapse keys so many messages in the same conversation don’t generate a storm of pushes. Instead of sending one push per message, you collapse them by conversation and send “new messages available” signals. When the client wakes, it syncs from its cursor and pulls the real data.

This framing also aligns with security and encryption. If you later discuss end-to-end encryption, you don’t want servers placing plaintext message bodies in push payloads. Even without E2EE, minimal pushes reduce sensitive data exposure and improve reliability by keeping the authoritative data path in your own infrastructure.

Collapse key / topic

Collapse by conversation_id

Avoid push storms; better UX

Minimal payload

Include conversation hint, not full history

Payload limits; privacy; supports E2EE

Delivery uncertainty

Assume pushes can be delayed or dropped

Forces sync to be correct

Token management

Rotate device tokens, handle invalid tokens

Keeps notifications working

Reliability contract: “APNS/FCM tells the device to wake up. The real delivery guarantee comes from durable storage and cursor-based sync.”

Summary (after the explanation):

  • Treat push notifications as wake-ups, not delivery.

  • Use collapse keys to avoid notification storms.

  • Keep payload minimal; rely on sync to fetch messages reliably.

Presence and typing: eventually consistent by design#

Presence and typing indicators feel real-time, but they’re fundamentally soft state. Networks flap, mobile devices sleep, and reconnects happen constantly. The right approach is to make presence and typing eventually consistent and resilient to missed updates. Presence should typically degrade gracefully: if you miss a heartbeat, you mark the user “offline” after a timeout.

Presence systems often rely on ephemeral storage (like Redis) with TTLs. Gateways update presence on connect/disconnect and with periodic heartbeats. Typing indicators are usually sent over the same real-time channel but are not persisted; they’re scoped to a short window and can be dropped without harming correctness.

The interview point is to say what you do when presence is wrong: you favor availability and low latency, and you accept brief inaccuracies because the feature is informational, not transactional.

Online presence

Cache with TTL

Eventual

Times out to offline

Last seen

Durable store

Eventually consistent

Updated asynchronously

Typing

In-memory / transient

Best-effort

Dropped if disconnected

Active device

Session store

Eventual

Reconciled on reconnect

Common pitfall: Overbuilding presence with strong consistency. The latency and coordination cost isn’t worth it for a hint-like feature.

Summary (after the explanation):

  • Presence/typing are soft-state features; eventual consistency is appropriate.

  • Use TTL-based caches and heartbeats; degrade to offline on uncertainty.

  • Don’t persist typing; keep it ephemeral and best-effort.

Failure modes and crash safety: duplicates are inevitable, so design for dedupe#

A Staff-level chat design is honest about ambiguity. If a gateway pushes a message to a device and crashes before recording the acknowledgment, your system can’t know whether the device received it. The only safe action under at-least-once semantics is to retry, which can create duplicates. Duplicates are not a bug; they’re a consequence of building reliable systems over unreliable networks.

The solution is idempotency and dedupe at multiple layers. The server dedupes repeated sends from the client using a stable message_id. Devices dedupe repeated deliveries using message_id/seq. Cursor updates are monotonic: last_delivered_seq and last_read_seq only move forward, which makes retries safe.

This is also where you can mention leases and retryable tasks in the delivery pipeline. If a worker crashes mid-delivery, another worker can resume because tasks are durable. Your design stays correct because dedupe and monotonic cursors prevent duplicates from creating incorrect user-visible state.

Gateway crash

Connections drop; in-flight sends ambiguous

Client reconnect + sync from cursor

Worker crash after push

Possible duplicate delivery

Dedupe by message_id/seq; monotonic cursors

Messaging service restart

Temporary send failures

Durable queue/log; retry; idempotent accept

Cache loss

Presence/session mapping lost

Rebuild from reconnects; TTL caches

What great answers sound like: “I assume duplicates happen. I use stable IDs, dedupe on server and client, and I make state updates monotonic so retries are safe.”

Walkthrough 3: crash after send causes duplicates, solved by idempotency/dedupe#

A recipient device is online. A worker pushes message (conversation_id, seq=1042, message_id=X) through the gateway, and the device renders it. Before the device ack reaches the core, the gateway process crashes and the connection drops. The delivery pipeline times out waiting for ack and schedules a retry.

The recipient reconnects and runs sync with last_delivered_seq=1041 (it never got to ack 1042). The server streams message 1042 again. The device sees message_id=X (or seq=1042) already in local storage and ignores the duplicate while still updating its cursor to 1042 and sending the ack. Now the system converges: the delivery pipeline marks the device delivered, and the sender may see “delivered” correctly.

Summary (after the explanation):

  • Crash ambiguity leads to retries and duplicates.

  • Stable message identifiers enable dedupe on both server and device.

  • Cursor-based sync makes reconnection converge to correct state.

  • Monotonic cursor updates prevent state regressions.

Observability and SLOs: prove the system works#

Great designs are measurable. In interviews, naming metrics signals operational maturity. You want latency metrics for the send path, backlog metrics for delivery pipelines, and correctness indicators for sync and presence. Metrics should be broken down by region, gateway cluster, conversation type (1:1 vs group), and online/offline segments.

You should include at least one “user-perceived” latency SLO, such as p95 send-to-delivered for online recipients, and a separate SLO for offline delivery measured as reconnect sync lag. Presence accuracy is inherently fuzzy, but you can still track disconnect/reconnect rates and “stale presence” percentages based on heartbeat expiration.

p95 send→delivered (online)

Real-time performance

Drives gateway and worker scaling

p95 send→persisted

Core durability latency

Indicates DB/log health

Queue backlog / lag

Delivery pipeline pressure

Shows capacity mismatch

Offline sync lag

Time to catch up after reconnect

Measures offline reliability

Reconnect rate

Connection stability

Highlights gateway/network issues

Presence staleness rate

Accuracy of online status

Tunes heartbeat and TTL policies

Interviewer tip: If you only measure “messages sent per second,” you won’t catch the outages users feel. Latency, lag, and sync convergence are the real reliability signals.

Summary (after the explanation):

  • Track p95 latency for online delivery and persistence separately.

  • Monitor queue lag and offline sync lag to see reliability drift.

  • Treat presence metrics as heuristics, not hard correctness signals.

Security and privacy: authentication, authorization, and encryption posture#

Security in chat starts at the connection. Every gateway connection should be authenticated (JWT or similar), and every message send should be authorized against conversation membership. You should also rate-limit sends to reduce spam and protect infrastructure. For group chats, membership changes must be enforced consistently so removed users can’t send or receive messages.

End-to-end encryption (E2EE) is often brought up. In an interview, you don’t need to implement it fully, but you should place it correctly: the server stores opaque ciphertext, metadata remains visible (conversation_id, sender_id, timestamps), and push payloads stay minimal. E2EE shifts complexity to clients (key management and device enrollment) but doesn’t fundamentally change the delivery pipeline design; it changes what the server can inspect.

Connection auth

JWT/mTLS

Auth at gateway, verify on upgrade

Message authZ

Membership check

Every send validates sender is a member

Abuse prevention

Rate limits, spam detection hooks

Enforce at gateway and core

Encryption posture

TLS in transit; optional E2EE

E2EE changes payload handling, not delivery semantics

Common pitfall: Treating “TLS is enough” as a complete security answer. You still need membership authorization and abuse controls.

Summary (after the explanation):

  • Authenticate connections and authorize every message by membership.

  • Rate-limit to prevent abuse and protect resources.

  • Discuss E2EE as a posture decision that preserves delivery architecture.

What a strong interview answer sounds like#

A strong answer is structured and decisive. You start with guarantees, then present the architecture, then zoom into the hard parts: ordering, durability, multi-device sync, group fan-out, and failure recovery. You keep presence/typing in their lane as eventually consistent. You finish with metrics and trade-offs.

Here’s a 30–60 second outline that interviewers recognize as “this person has built messaging systems.”

Sample response outline: “I’ll build chat around durable message history plus cursor-based sync. Clients connect via WebSockets to stateless gateways, which route to a messaging service that authenticates, assigns per-conversation sequence numbers for ordering, persists messages, and publishes delivery tasks. Delivery is at-least-once, so I include stable message IDs for server/client dedupe. Online devices get pushed immediately; offline devices get a minimal push notification as a wake-up hint, then catch up via sync from per-device cursors. Read receipts are cursor updates (last_read_seq), presence/typing are eventually consistent via TTL caches. For groups, I pick fan-out strategy based on size—hybrid or fan-out-on-read for hot groups to avoid write amplification. I’ll instrument p95 send-to-delivered, queue lag, offline sync lag, reconnect rates, and presence staleness to prove reliability.”

Checklist (keep it short and concrete):

  • Define guarantees: at-least-once, per-conversation ordering, presence eventual consistency

  • Persist before fan-out; delivery is a retryable pipeline

  • Use stable IDs for dedupe and monotonic cursors for convergence

  • Explain multi-device sync with per-device cursors and reconciliation

  • Compare group fan-out strategies and hot group mitigations

  • Name key metrics: p95 latency, lag, sync convergence, reconnects

Interviewer tip: The best answers sound like a system you could operate: clear guarantees, explicit state, failure recovery, and metrics.

Happy learning!


Written By:
Khayyam Hashmi