Chat System Design

Table of Contents

Clarify requirements and set the right guarantees High-level architecture: separate connections, routing, and durability Core data model: optimize for conversation history and delivery tracking Message delivery flow: ordering, durability, and at-least-once semantics Message delivery state machine: make delivery and receipts explicit Walkthrough 1: online → online message with delivery and read receipts Multi-device sync and state tracking Walkthrough 2: online → offline message with notification and reconnect sync Group chat fan-out strategies Walkthrough: a “hot group” scenario and mitigations Push notifications are a hint, not delivery Presence and typing: eventually consistent by design Failure modes and crash safety: duplicates are inevitable, so design for dedupe Walkthrough 3: crash after send causes duplicates, solved by idempotency/dedupe Observability and SLOs: prove the system works Security and privacy: authentication, authorization, and encryption posture What a strong interview answer sounds like

Home/

Blog/

Chat System Design

Build chat as store + sync: persist ordered messages, deliver at-least-once with dedupe, reconcile devices with per-device cursors, pick the right group fan-out strategy, treat push as a wake-up hint, and measure latency, lag, and sync convergence.

19 mins read

Jan 22, 2026

Real-time chat looks simple from the UI, but it’s one of the most concentrated System Design interview problems you’ll face. A chat system forces you to deal with persistent connections, low-latency fan-out, ordered and durable delivery, multi-device synchronization, and “soft real-time” features like presence and typing that can’t justify strong consistency at global scale.

Interviewers like chat design because the hard parts are interwoven. Your connection layer influences delivery semantics. Your data model influences ordering and sync performance. Your fan-out strategy determines whether group chat is cheap or catastrophically expensive. A solid answer demonstrates that you can keep guarantees crisp (what you promise) while making the architecture pragmatic (what you actually build).

This blog walks through a Staff-level, interview-friendly design: you’ll see how to define guarantees, pick an architecture, add a message delivery state machine, handle multi-device sync, choose group fan-out strategies, treat push notifications correctly, and instrument the whole system with metrics that prove it works.

Grokking Modern System Design Interview

System Design Interviews decide your level and compensation at top tech companies. To succeed, you must design scalable systems, justify trade-offs, and explain decisions under time pressure. Most candidates struggle because they lack a repeatable method. Built by FAANG engineers, this is the definitive System Design Interview course. You will master distributed systems building blocks: databases, caches, load balancers, messaging, microservices, sharding, replication, and consistency, and learn the patterns behind web-scale architectures. Using the RESHADED framework, you will translate open-ended system design problems into precise requirements, explicit constraints, and success metrics, then design modular, reliable solutions. Full Mock Interview practice builds fluency and timing. By the end, you will discuss architectures with Staff-level clarity, tackle unseen questions with confidence, and stand out in System Design Interviews at leading companies.

26hrs

Intermediate

5 Playgrounds

26 Quizzes

Interviewer tip: Chat is a “delivery + sync” system, not a “send over WebSocket” system. If your design can’t reconcile state after disconnects, it isn’t reliable.

Clarify requirements and set the right guarantees#

Start by narrowing scope and stating guarantees. Chat interviews go sideways when candidates list features without defining what’s durable, what’s ordered, and what’s best-effort. A strong opening frames the system around messaging reliability: messages should be durable and eventually delivered, ordering should be per conversation, and transient UI signals like presence/typing should be eventually consistent.

For functional scope, you can assume direct messages and group chats, message history, delivery/read receipts, presence, typing indicators, and basic attachments (at least at the metadata level). For a first pass, keep the payload as “text + metadata,” while leaving room for media via an object store.

The most important part is to state your guarantees explicitly. The interview-winning stance is: at-least-once delivery (with deduplication), per-conversation ordering, and eventual consistency for presence/typing. Those are realistic and defendable at scale.

Common pitfall: Promising “real-time” without defining a sync mechanism. WebSockets help latency, but sync defines correctness.

Summary (after the explanation):

State guarantees early: at-least-once, per-conversation ordering, presence/typing eventual consistency.
Keep scope realistic: direct + group, history, receipts, presence, typing, notifications.
Treat reconnect sync as part of delivery, not an optional add-on.

High-level architecture: separate connections, routing, and durability#

A scalable chat architecture separates the connection-heavy edge from the durable messaging core. The edge layer maintains millions of persistent client connections (WebSocket is typical; mobile may also use platform sockets). The core layer validates, sequences, persists, and routes messages. This separation keeps your messaging logic stateless and horizontally scalable while allowing the gateway tier to optimize for long-lived connections.

A common structure is: Clients → Gateway/Connection Manager → Messaging Service → Storage + Queue, with a Presence Service, a Notification Service, and caches for quick lookups. The messaging service is where you assign conversation sequence numbers, persist the message, and publish delivery tasks. The gateway is where you deliver to online devices by pushing over existing connections.

For reliability, add a durable queue (or log) between “message accepted” and “message delivered to devices.” This is what lets you survive gateway crashes, worker restarts, and recipient offline periods without losing messages. Think of it as the internal “delivery pipeline” that you can replay.

Client	Send, ack, maintain cursors, render state	Holds per-device truth and drives sync
Gateway / Connection Manager	Maintain WebSocket connections; push to devices	Optimized for long-lived connections
Messaging Service	AuthZ, sequencing, persistence, publish delivery tasks	Enforces guarantees and ordering
Delivery workers	Fan-out to online devices; track acks; retry	Converts durable tasks into pushes
Message store	Conversation-partitioned durable history	Efficient history reads and ordered queries
Presence service	Track online/last-seen/typing (soft state)	Eventually consistent UX
Notification service	Request APNS/FCM notifications for offline devices	Wake-up hint, not delivery

Interviewer tip: Name your “source of truth” explicitly. In chat, message history + per-device cursors define correctness. Gateways are not your source of truth.

Summary (after the explanation):

Gateways handle connections; messaging core handles correctness and persistence.
Use a durable queue/log for delivery tasks and replay.
Treat presence/typing as soft state, separate from message durability.

Core data model: optimize for conversation history and delivery tracking#

Chat workloads are dominated by two access patterns: “append messages to a conversation” and “read messages in a conversation by order.” That pushes you toward a partition key of conversation_id and a clustering key that preserves order (sequence_number or time-based ordering with a tie-breaker). This is why distributed NoSQL stores like Cassandra/DynamoDB/HBase are often chosen: they’re great at append-heavy, partitioned reads.

Beyond message storage, you need to model membership and per-user/per-device state. Delivery/read receipts at scale are tricky: storing per-message per-recipient status becomes expensive in large groups. Many systems use a hybrid: per-message delivery state for small conversations, and per-user “last_read_seq” pointers for larger ones (so the UI derives read states relative to that pointer).

You also need a consistent way to dedupe duplicates under at-least-once delivery. That typically means a message_id generated client-side (or by the server) that remains stable across retries, plus a dedupe table keyed by (conversation_id, message_id) or (sender_id, client_msg_id).

Common pitfall: Modeling read receipts as “status per message per user” for all group sizes. That explodes in storage and write load for large groups.

Summary (after the explanation):

Partition messages by conversation_id and order by seq for efficient history reads.
Track membership and last_read_seq to scale receipts in groups.
Add a dedupe strategy keyed by stable message identifiers.

Message delivery flow: ordering, durability, and at-least-once semantics#

A strong chat answer describes the send path as a series of durable steps. The client sends a message with a stable client-generated message_id (or idempotency key). The messaging service authenticates and authorizes the sender, assigns the next sequence number for the conversation (this is your ordering point), persists the message, and emits delivery tasks for recipients’ devices.

You should be explicit about where ordering is decided. In most designs, ordering is assigned server-side when the message is accepted. That means you either need a per-conversation sequencer (logical) or a storage operation that can produce monotonic ordering. In practice, you can implement sequencing as: a per-conversation counter in a strongly consistent store, a lightweight sequencer service partitioned by conversation_id, or optimistic assignment with conflict resolution (harder). In interviews, it’s usually enough to say “a sequencer partitioned by conversation_id assigns seq.”

At-least-once delivery means devices may see duplicates. You prevent duplicates by having devices dedupe by message_id (or seq) and by having the server dedupe repeated sends. Your delivery pipeline should be retryable: if pushing to a gateway fails, a worker can retry later without losing the message.

What great answers sound like: “I assign ordering at the server, persist before fan-out, and treat delivery as a retryable pipeline. Dedupe is built in because at-least-once is inevitable.”

Summary (after the explanation):

Server assigns per-conversation ordering via sequence numbers.
Persist message before fan-out to ensure durability.
Use at-least-once delivery with dedupe by message_id/seq.
Treat delivery as retryable tasks, not a single WebSocket write.

Message delivery state machine: make delivery and receipts explicit#

A message system becomes interview-grade when you can describe its state transitions. Chat delivery isn’t one state; it’s a lifecycle. You need states for “accepted,” “delivered to device,” and “read by user,” and you need to say what’s persisted at each step so the system can recover after crashes.

One practical approach is to separate message state (durable in message store) from per-recipient/per-device progress (durable as cursors). For example: store the message once, then store per device “last_delivered_seq” and per user “last_read_seq.” Delivery receipts become cursor updates, not per-message status updates. For small 1:1 chats you may also store per-message status for simplicity, but the cursor model is the scalable baseline.

Below is a state machine that describes delivery from the system’s perspective, with persisted evidence at each step.

Interviewer tip: Receipts are state reconciliation. If you can’t explain how cursors converge after disconnects, your “read/delivered” indicators will be wrong under real network conditions.

Summary (after the explanation):

Use a state machine to explain delivery and receipts clearly.
Persist evidence (message row, tasks, cursor updates) so crashes don’t break correctness.
Prefer cursor-based receipts for scalability, especially in groups.

Walkthrough 1: online → online message with delivery and read receipts#

Consider a 1:1 chat where both users are online on two devices each. The sender’s client submits message_id and payload to the gateway, which forwards it to the messaging service. The service dedupes, assigns the next sequence number for the conversation, and persists the message. It then emits delivery tasks for each recipient device currently connected.

Delivery workers push the message over the recipient devices’ gateway connections. Each device dedupes by message_id/seq, renders the message, and sends an acknowledgment indicating the highest sequence number delivered for that conversation. The server updates each device’s delivered cursor. When the recipient opens the conversation view, the client sends a read receipt (often as “last_read_seq”), and the server updates membership state.

The sender’s UI can now derive receipt states. If you’re using cursor-based receipts, the sender sees “delivered” when recipient’s delivered cursor ≥ message seq and “read” when last_read_seq ≥ message seq. This avoids per-message status writes and scales cleanly.

Common pitfall: Treating read receipts as strongly consistent. In reality, they’re best-effort and may arrive out of order; using cursors makes reconciliation simpler.

Summary (after the explanation):

Persist, then fan-out to online devices.
Devices ack delivered via per-device delivered cursors.
Read receipts update per-user last_read_seq.
Sender derives receipts by comparing seq to recipient cursors.

Multi-device sync and state tracking#

Multi-device sync is where chat designs either become robust or collapse into hand-wavy promises. Users log in from multiple devices, disconnect frequently, and expect history, receipts, and unread counts to converge correctly. The key concept is per-device cursors: each device tracks what it has seen, and the server stores enough per-device state to reconcile after reconnects.

A practical sync model is cursor-based pull with server push as an optimization. When a device connects (or reconnects), it sends the last_delivered_seq per conversation (or a global cursor if you model differently). The server responds with missing messages in order, and the device updates its cursor. This ensures offline delivery even if push notifications fail, gateways crash, or connections flap.

You also need to reconcile delivered vs read across devices. A user-level last_read_seq is typically shared across devices (reading on phone marks read on desktop). Device-level delivered cursors remain per device (a laptop might not have synced yet). This separation lets your UI represent “delivered to one of user’s devices” versus “delivered to all devices,” depending on product requirements.

What interviewers look for in sync answers: “I want to hear per-device cursors, a reconnect sync flow that doesn’t rely on notifications, and a clear reconciliation strategy for delivered vs read across devices.”

Summary (after the explanation):

Use per-device cursors to drive reconnect sync and dedupe.
Treat push as optimization; sync is the correctness mechanism.
Separate device-delivered from user-read state for clean reconciliation.

Walkthrough 2: online → offline message with notification and reconnect sync#

Now consider the recipient is offline (no active gateway connection). The send path is the same through persistence: the message is deduped, sequenced, and stored. The difference is delivery: the server can’t push over WebSocket, so it records that the recipient’s devices have not advanced their delivered cursors and it triggers a push notification request.

The notification payload should be minimal: maybe conversation_id, sender display name, and a hint like “new message.” You avoid stuffing the full message into the push payload because of platform constraints and because it undermines end-to-end encryption designs. The push’s job is to wake the app so it can reconnect and sync.

When the recipient comes online (either by tapping the notification or by background refresh), the device establishes a connection, sends its last_delivered_seq per conversation, and the server streams missing messages in order. Only after the device confirms receipt does the server advance delivered cursors and generate delivery receipts back to the sender if the product supports it.

Interviewer tip: Your offline story should still work if notifications are delayed, dropped, or disabled. If it doesn’t, you’ve built a notification system, not a chat system.

Summary (after the explanation):

Persist first, then treat offline delivery as “sync later.”
Send a minimal push as a wake-up hint.
On reconnect, sync from per-device cursor to catch up reliably.
Advance delivered/read state based on sync acknowledgments, not notifications.

Group chat fan-out strategies#

Group chat changes the economics of delivery. In 1:1, you deliver to a small set of devices. In a large group, a single message can require thousands of deliveries, and naïve per-recipient writes will crush your database or queue. This is where interviewers want you to talk about fan-out strategies: fan-out-on-write, fan-out-on-read, and hybrid.

Fan-out-on-write means you expand the message into per-recipient inbox entries at send time. Reads are fast because each user reads their inbox, but writes become expensive for large groups. Fan-out-on-read means you store the message once in the conversation log and users fetch it when they read; this keeps writes cheap but makes reads and “unread counts” more complex. Hybrid approaches store the message once but maintain per-user cursors and selectively precompute inbox entries for small groups or for online users.

There isn’t one correct choice. A strong answer ties the choice to product realities: group size distributions, online/offline rates, unread count UX, and infrastructure cost constraints.

Common pitfall: Using fan-out-on-write for all groups. It looks simple until a “hot group” creates a write amplification incident.

Walkthrough: a “hot group” scenario and mitigations#

Imagine a group with 50,000 members and rapid message volume. If you fan-out-on-write, every message becomes 50,000 inbox writes plus deliveries, which can overwhelm storage and enqueue systems. Even if your workers can push to online users, the durable per-recipient writes will dominate cost and latency.

With fan-out-on-read or hybrid, you store each message once in the conversation log and update a compact set of state: per-user last_read_seq and per-device delivered cursors. Online members can receive pushed messages without permanently writing per-recipient inbox rows. For offline members, you rely on sync: they fetch from the log when they reconnect. Mitigations also include splitting “delivery” from “inbox computation,” rate limiting message sends for extreme groups, using tiered infrastructure for large channels, and caching recent segments of the log.

Interviewer tip: For hot groups, say the quiet part out loud: “I’m optimizing for storage efficiency and avoiding per-recipient write amplification; sync and cursors carry the correctness.”

Summary (after the explanation):

Compare fan-out strategies using cost and group-size realities.
Hot groups require avoiding per-recipient writes on every message.
Hybrid models commonly push to online users while keeping storage log-centric.

Push notifications are a hint, not delivery#

Push notifications (APNS/FCM) are important, but they are not a delivery mechanism you control. They can be delayed, collapsed, throttled, or disabled by user settings and OS policies. In interviews, treating push as “delivery” is a correctness flaw. The correct model is: durable storage + sync is delivery, push is a wake-up hint to reduce perceived latency.

You should mention platform constraints. Notifications have payload size limits, and you often want to use collapse keys so many messages in the same conversation don’t generate a storm of pushes. Instead of sending one push per message, you collapse them by conversation and send “new messages available” signals. When the client wakes, it syncs from its cursor and pulls the real data.

This framing also aligns with security and encryption. If you later discuss end-to-end encryption, you don’t want servers placing plaintext message bodies in push payloads. Even without E2EE, minimal pushes reduce sensitive data exposure and improve reliability by keeping the authoritative data path in your own infrastructure.

Reliability contract: “APNS/FCM tells the device to wake up. The real delivery guarantee comes from durable storage and cursor-based sync.”

Summary (after the explanation):

Treat push notifications as wake-ups, not delivery.
Use collapse keys to avoid notification storms.
Keep payload minimal; rely on sync to fetch messages reliably.

Presence and typing: eventually consistent by design#

Presence and typing indicators feel real-time, but they’re fundamentally soft state. Networks flap, mobile devices sleep, and reconnects happen constantly. The right approach is to make presence and typing eventually consistent and resilient to missed updates. Presence should typically degrade gracefully: if you miss a heartbeat, you mark the user “offline” after a timeout.

Presence systems often rely on ephemeral storage (like Redis) with TTLs. Gateways update presence on connect/disconnect and with periodic heartbeats. Typing indicators are usually sent over the same real-time channel but are not persisted; they’re scoped to a short window and can be dropped without harming correctness.

The interview point is to say what you do when presence is wrong: you favor availability and low latency, and you accept brief inaccuracies because the feature is informational, not transactional.

Common pitfall: Overbuilding presence with strong consistency. The latency and coordination cost isn’t worth it for a hint-like feature.

Summary (after the explanation):

Presence/typing are soft-state features; eventual consistency is appropriate.
Use TTL-based caches and heartbeats; degrade to offline on uncertainty.
Don’t persist typing; keep it ephemeral and best-effort.

Failure modes and crash safety: duplicates are inevitable, so design for dedupe#

A Staff-level chat design is honest about ambiguity. If a gateway pushes a message to a device and crashes before recording the acknowledgment, your system can’t know whether the device received it. The only safe action under at-least-once semantics is to retry, which can create duplicates. Duplicates are not a bug; they’re a consequence of building reliable systems over unreliable networks.

The solution is idempotency and dedupe at multiple layers. The server dedupes repeated sends from the client using a stable message_id. Devices dedupe repeated deliveries using message_id/seq. Cursor updates are monotonic: last_delivered_seq and last_read_seq only move forward, which makes retries safe.

This is also where you can mention leases and retryable tasks in the delivery pipeline. If a worker crashes mid-delivery, another worker can resume because tasks are durable. Your design stays correct because dedupe and monotonic cursors prevent duplicates from creating incorrect user-visible state.

What great answers sound like: “I assume duplicates happen. I use stable IDs, dedupe on server and client, and I make state updates monotonic so retries are safe.”

Walkthrough 3: crash after send causes duplicates, solved by idempotency/dedupe#

A recipient device is online. A worker pushes message (conversation_id, seq=1042, message_id=X) through the gateway, and the device renders it. Before the device ack reaches the core, the gateway process crashes and the connection drops. The delivery pipeline times out waiting for ack and schedules a retry.

The recipient reconnects and runs sync with last_delivered_seq=1041 (it never got to ack 1042). The server streams message 1042 again. The device sees message_id=X (or seq=1042) already in local storage and ignores the duplicate while still updating its cursor to 1042 and sending the ack. Now the system converges: the delivery pipeline marks the device delivered, and the sender may see “delivered” correctly.

Summary (after the explanation):

Crash ambiguity leads to retries and duplicates.
Stable message identifiers enable dedupe on both server and device.
Cursor-based sync makes reconnection converge to correct state.
Monotonic cursor updates prevent state regressions.

Observability and SLOs: prove the system works#

Great designs are measurable. In interviews, naming metrics signals operational maturity. You want latency metrics for the send path, backlog metrics for delivery pipelines, and correctness indicators for sync and presence. Metrics should be broken down by region, gateway cluster, conversation type (1:1 vs group), and online/offline segments.

You should include at least one “user-perceived” latency SLO, such as p95 send-to-delivered for online recipients, and a separate SLO for offline delivery measured as reconnect sync lag. Presence accuracy is inherently fuzzy, but you can still track disconnect/reconnect rates and “stale presence” percentages based on heartbeat expiration.

Interviewer tip: If you only measure “messages sent per second,” you won’t catch the outages users feel. Latency, lag, and sync convergence are the real reliability signals.

Summary (after the explanation):

Track p95 latency for online delivery and persistence separately.
Monitor queue lag and offline sync lag to see reliability drift.
Treat presence metrics as heuristics, not hard correctness signals.

Security and privacy: authentication, authorization, and encryption posture#

Security in chat starts at the connection. Every gateway connection should be authenticated (JWT or similar), and every message send should be authorized against conversation membership. You should also rate-limit sends to reduce spam and protect infrastructure. For group chats, membership changes must be enforced consistently so removed users can’t send or receive messages.

End-to-end encryption (E2EE) is often brought up. In an interview, you don’t need to implement it fully, but you should place it correctly: the server stores opaque ciphertext, metadata remains visible (conversation_id, sender_id, timestamps), and push payloads stay minimal. E2EE shifts complexity to clients (key management and device enrollment) but doesn’t fundamentally change the delivery pipeline design; it changes what the server can inspect.

Common pitfall: Treating “TLS is enough” as a complete security answer. You still need membership authorization and abuse controls.

Summary (after the explanation):

Authenticate connections and authorize every message by membership.
Rate-limit to prevent abuse and protect resources.
Discuss E2EE as a posture decision that preserves delivery architecture.

What a strong interview answer sounds like#

A strong answer is structured and decisive. You start with guarantees, then present the architecture, then zoom into the hard parts: ordering, durability, multi-device sync, group fan-out, and failure recovery. You keep presence/typing in their lane as eventually consistent. You finish with metrics and trade-offs.

Here’s a 30–60 second outline that interviewers recognize as “this person has built messaging systems.”

Sample response outline: “I’ll build chat around durable message history plus cursor-based sync. Clients connect via WebSockets to stateless gateways, which route to a messaging service that authenticates, assigns per-conversation sequence numbers for ordering, persists messages, and publishes delivery tasks. Delivery is at-least-once, so I include stable message IDs for server/client dedupe. Online devices get pushed immediately; offline devices get a minimal push notification as a wake-up hint, then catch up via sync from per-device cursors. Read receipts are cursor updates (last_read_seq), presence/typing are eventually consistent via TTL caches. For groups, I pick fan-out strategy based on size—hybrid or fan-out-on-read for hot groups to avoid write amplification. I’ll instrument p95 send-to-delivered, queue lag, offline sync lag, reconnect rates, and presence staleness to prove reliability.”

Checklist (keep it short and concrete):

Define guarantees: at-least-once, per-conversation ordering, presence eventual consistency
Persist before fan-out; delivery is a retryable pipeline
Use stable IDs for dedupe and monotonic cursors for convergence
Explain multi-device sync with per-device cursors and reconciliation
Compare group fan-out strategies and hot group mitigations
Name key metrics: p95 latency, lag, sync convergence, reconnects

Interviewer tip: The best answers sound like a system you could operate: clear guarantees, explicit state, failure recovery, and metrics.

Happy learning!

Written By:

Khayyam Hashmi

Free Resources

blog

Common mistakes to avoid during the AWS Cloud Practitioner exam

blog

Career opportunities after becoming an AWS AI Practitioner

blog

Roblox System Design Interview

Capability	What you guarantee	What you do not guarantee (and why)
Message delivery	At-least-once to each device, with dedupe	Exactly-once (too costly; crashes cause ambiguity)
Ordering	Per conversation ordering	Global ordering (unnecessary and impractical)
Presence/typing	Eventually consistent, best-effort	Strong consistency (too expensive; not worth the latency)
History	Durable storage + sync by cursor	“Never missing” without reconnect sync (networks disconnect)

Messages	(conversation_id, seq)	message_id, sender_id, timestamp, payload_ref	Ordered history retrieval
Conversations	conversation_id	type, created_at, metadata	Conversation identity
Membership	(conversation_id, user_id)	role, joined_at, last_read_seq	AuthZ + read pointers
Device sessions	device_id	user_id, gateway_id, connected_at	Routing online devices
Dedupe	(conversation_id, message_id)	first_seen_at	Idempotency for retries

Accept message	Dedupe record + message envelope	Makes retries idempotent
Assign ordering	conversation seq	Guarantees per-conversation order
Store message	Messages(conversation_id, seq)	Durable history and sync source
Publish delivery	Delivery tasks per recipient/device	Separates durability from online push
Ack processing	Update per-device delivered cursor	Lets receipts and sync converge

ACCEPTED	Message is validated and ordered	Message row + dedupe record	Messaging service commits
ENQUEUED	Delivery tasks exist	Delivery task records/log entries	Fan-out publish succeeds
DELIVERED_TO_DEVICE	Device received over connection or via sync	device_cursor.last_delivered_seq update	Device ack or sync confirmation
READ_BY_USER	User viewed the message	membership.last_read_seq update	Client read receipt
EXPIRED/FAILED	Delivery abandoned (rare; policy-driven)	Failure record + alert	TTL exceeded / blocked user

device_id	Device sessions store	Stable identity for routing and cursors
gateway_id / connection_id	Presence/session cache	Push routing for online devices
last_delivered_seq per conversation	Device cursor store	Determines what to sync on reconnect
last_ack_time	Cursor store	Detect stuck devices; drive cleanup
app_version/capabilities	Device profile	Backward compatibility for payload/features
notification_token	Notification store	APNS/FCM targeting for wake-ups

Fan-out-on-write	High write cost, low read cost	Low for reads	Medium	Small/medium groups, inbox-centric UX
Fan-out-on-read	Low write cost, higher read work	Depends on read path	High	Very large groups, log-centric storage
Hybrid	Balanced	Low for common cases	High	Mixed workloads, needs careful tuning

Collapse key / topic	Collapse by conversation_id	Avoid push storms; better UX
Minimal payload	Include conversation hint, not full history	Payload limits; privacy; supports E2EE
Delivery uncertainty	Assume pushes can be delayed or dropped	Forces sync to be correct
Token management	Rotate device tokens, handle invalid tokens	Keeps notifications working

Online presence	Cache with TTL	Eventual	Times out to offline
Last seen	Durable store	Eventually consistent	Updated asynchronously
Typing	In-memory / transient	Best-effort	Dropped if disconnected
Active device	Session store	Eventual	Reconciled on reconnect

Gateway crash	Connections drop; in-flight sends ambiguous	Client reconnect + sync from cursor
Worker crash after push	Possible duplicate delivery	Dedupe by message_id/seq; monotonic cursors
Messaging service restart	Temporary send failures	Durable queue/log; retry; idempotent accept
Cache loss	Presence/session mapping lost	Rebuild from reconnects; TTL caches

p95 send→delivered (online)	Real-time performance	Drives gateway and worker scaling
p95 send→persisted	Core durability latency	Indicates DB/log health
Queue backlog / lag	Delivery pipeline pressure	Shows capacity mismatch
Offline sync lag	Time to catch up after reconnect	Measures offline reliability
Reconnect rate	Connection stability	Highlights gateway/network issues
Presence staleness rate	Accuracy of online status	Tunes heartbeat and TTL policies

Connection auth	JWT/mTLS	Auth at gateway, verify on upgrade
Message authZ	Membership check	Every send validates sender is a member
Abuse prevention	Rate limits, spam detection hooks	Enforce at gateway and core
Encryption posture	TLS in transit; optional E2EE	E2EE changes payload handling, not delivery semantics