Miro System Design Explained
Explore how Miro powers real-time collaboration on infinite whiteboards. This deep dive covers object-based syncing, conflict resolution, versioning, and how teams stay in sync during live collaboration.
Miro system design is the architecture behind a real-time collaborative whiteboard that must synchronize fine-grained object manipulations across hundreds of concurrent users on an effectively infinite two-dimensional canvas. It combines spatial data modeling, operation-based syncing with conflict resolution strategies like CRDTs or Operational Transformation, low-latency streaming over WebSockets, and viewport-aware rendering to deliver a fluid, shared creative experience at scale.
Key takeaways
- Real-time sync is operation-based, not state-based: The system streams small, granular operations rather than full board snapshots, enabling low-latency collaboration without locking.
- Conflict resolution demands named strategies: Approaches like CRDTs and Operational Transformation provide deterministic convergence when multiple users edit the same object simultaneously.
- Spatial partitioning drives scalability: Techniques such as quad-tree indexing and viewport virtualization ensure that only visible regions of the canvas consume memory, bandwidth, and rendering cycles.
- Presence and board state are deliberately separated: Ephemeral cursor and selection data flows through a fast, lossy channel while durable board mutations follow a reliable, ordered pipeline.
- Versioning combines snapshots with operation logs: Periodic snapshots plus replayable operation histories allow efficient undo, board recovery, and audit without corrupting the live state.
Most software asks you to work inside a fixed page, a bounded spreadsheet, or a linear document. Miro throws all of that away. It hands you an infinite plane and says, “go.” Drag a sticky note. Draw a connector. Watch twenty cursors swarm a wireframe during a live workshop. Nothing reloads. Nothing locks. The board just absorbs it all.
That seamlessness is a lie your browser tells you. Behind it sits a deeply complex distributed system that must ingest thousands of micro-edits per second, merge conflicting changes deterministically, render only what each user actually sees, and never lose a sticky note. Designing that system, or explaining how you would in an interview, is one of the hardest exercises in modern system design.
This blog walks through the full architecture of a Miro-like system. We will cover the data model, real-time sync engine, conflict resolution, spatial partitioning, rendering pipeline, versioning, and the infrastructure trade-offs that hold it all together.
Understanding the core problem#
At its heart, Miro is a real-time collaborative canvas. Users create and manipulate visual objects (shapes, text blocks, connectors, images, sticky notes) on an effectively infinite two-dimensional plane. Unlike a Google Doc or a spreadsheet, a whiteboard has no rows, no pages, and no fixed structure. Objects can exist anywhere, overlap arbitrarily, group, ungroup, and link in any direction.
This spatial freedom is what makes whiteboards powerful, and what makes them brutal to engineer. The system must continuously answer hard questions:
- What exists? Which objects are on this board right now?
- Where is everything? What are the precise coordinates, z-order, and bounding boxes?
- Who changed what? How do we attribute and order concurrent mutations?
Interactions are continuous and fine-grained. A single drag gesture can generate dozens of positional updates per second. Multiply that by a hundred concurrent users and you begin to see why naive architectures collapse.
Real-world context: During enterprise planning sessions, Miro boards routinely host 100+ simultaneous editors with tens of thousands of objects. A system that works for five users on a small board can catastrophically degrade at this scale without deliberate architectural decisions.
These constraints define the boundaries of Miro system design and set the stage for every architectural choice that follows. Let’s start by pinning down exactly what the system must do.
Core functional requirements#
Before diving into architecture, we need a clear contract. What must the system guarantee from a user’s perspective, and what must it guarantee from a platform perspective?
From the user side, the system must support:
- Board management: Create, open, rename, delete, and organize boards within workspaces.
- Object manipulation: Add, move, resize, rotate, style, group, and delete objects with sub-second feedback.
- Infinite canvas navigation: Pan and zoom smoothly across an unbounded coordinate space.
- Real-time collaboration: See other users’ changes appear live without refreshing the page.
- History and recovery: Browse version history, undo/redo personal actions, and restore earlier board states.
- Sharing and permissions: Invite collaborators with view, comment, or edit access, changeable at any time.
From the platform side, durability, access control enforcement, and cross-device consistency are non-negotiable. Every edit that the server acknowledges must survive restarts, failovers, and even regional outages.
Attention: It is tempting to treat object manipulation as a single monolithic operation. In practice, a “move” is a stream of positional updates, a “style change” is a property patch, and a “group” is a structural mutation. Each has different conflict semantics and sync requirements.
What makes this requirement set uniquely challenging is granularity. A document editor sends paragraph-level diffs. A spreadsheet sends cell-level diffs. Miro sends property-level diffs on spatially distributed objects at high frequency. That granularity shapes every layer of the stack, starting with the non-functional requirements that constrain our design choices.
Non-functional requirements that shape the design#
Functional requirements tell you what to build. Non-functional requirements tell you how to build it, and they dominate Miro system design.
Latency is the most visible constraint. When a user drags an object, the local response must feel instant (under 16ms for 60fps rendering). Remote propagation to other collaborators should land within 50 to 150ms. Anything above 300ms breaks the illusion of a shared, live space.
Consistency is nuanced. Not all operations need strong consistency. Moving a sticky note can tolerate brief divergence across clients, but deleting an object or changing permissions cannot. The system needs a spectrum:
Scalability spans multiple axes. Boards range from a solo user sketching to 300 editors in a live workshop. The object count ranges from a handful to 50,000+. The system must handle both extremes without separate code paths.
Here is how these requirements compare in priority:
Non-Functional Requirements Comparison
Requirement | Priority Level | Acceptable Threshold | Impact if Violated |
Latency | Critical | < 150ms for remote propagation | Collaboration feels broken |
Consistency | High | Eventual consistency within 1 second | Users encounter stale or conflicting data, causing confusion |
Scalability | High | Handles 10x increase in concurrent users | System becomes unresponsive or slow during peak usage |
Availability | Critical | 99.99% uptime (~52 minutes downtime/year) | Increased downtime leads to loss of user trust and revenue |
Durability | High | Zero acknowledged writes lost | Loss of critical user data, trust destruction, legal risk |
Availability is especially critical during live sessions. A Miro outage during a 200-person workshop is not a minor inconvenience; it derails the entire event. The system targets 99.95%+ availability for active collaboration sessions.
Pro tip: In a system design interview, explicitly stating that you will use different consistency levels for different operation types (e.g., eventual for cursor movement, causal for object edits, strong for deletions) signals mature architectural thinking.
User perception is the ultimate arbiter. Even technically correct behavior feels wrong if cursors stutter, objects flicker, or undo reverses someone else’s action. Perceived correctness matters as much as actual correctness. With these constraints established, let’s look at the high-level architecture that satisfies them.
High-level architecture overview#
The system decomposes into six major subsystems, each owning a specific concern but tightly coordinated through well-defined interfaces.
The following diagram captures how these subsystems relate to clients and to each other.
The six subsystems are:
- Board data and object storage service that persists the canonical state of every object.
- Real-time collaboration and sync engine that receives, orders, and broadcasts operations.
- Conflict resolution and operation merge layer that deterministically reconciles concurrent edits.
- Versioning and snapshot system that captures board history for undo, recovery, and audit.
- Sharing and access control service that gates every mutation behind permission checks.
- Presence and awareness service that distributes ephemeral cursor, selection, and viewport data.
Each subsystem can scale independently. The collaboration engine is the hottest path and receives the most resources. The versioning service is write-heavy but latency-tolerant. The presence service is high-throughput but lossy by design.
Historical note: Early collaborative whiteboard systems like Google Wave attempted to route all operations (including presence) through a single Operational Transformation pipeline. The resulting complexity and latency led to the modern pattern of separating ephemeral presence from durable board mutations entirely.
This separation of concerns is not just organizational. It is the foundation for independent scaling, fault isolation, and targeted optimization. The most fundamental of these subsystems is the data model, so let’s start there.
Board data model#
The board data model is the foundation everything else builds on. Get it wrong and every downstream system (sync, versioning, conflict resolution) suffers compounding complexity.
A board is not a document. It is a collection of independently addressable objects distributed across a two-dimensional coordinate space. Each object carries:
- A globally unique ID
- A type discriminator (sticky note, shape, connector, frame, image)
- Spatial properties:
x,y,width,height,rotation,z-index - Style properties: fill color, border, font, opacity
- Content: text payload, image reference, or connector endpoints
- Metadata:
created_by,modified_by,modified_at,version
Objects may also reference other objects. A connector has source_id and target_id. A frame contains a list of child IDs. These references create a sparse graph layered on top of the spatial plane.
The critical design decision is object-level granularity. The system stores and syncs objects independently rather than serializing the entire board as a monolithic blob. This means a single property change on one sticky note produces a small delta, not a full board rewrite.
Pro tip: In an interview, emphasize that object-level storage enables fine-grained conflict resolution, efficient delta syncing, and targeted cache invalidation. It is the single most important modeling decision in the entire system.
Each object also carries a monotonically increasing version counter (or a
This per-object model works well for flat boards, but real Miro boards are not flat. They are spatially vast. That raises the question of how we efficiently index and retrieve objects across a potentially enormous coordinate space.
Infinite canvas and spatial partitioning#
The canvas feels infinite, but the system cannot treat it that way internally. Loading and rendering every object on a board with 40,000 elements would overwhelm both the network and the GPU. The solution is spatial partitioning combined with viewport-aware loading.
Spatial indexing with quad-trees#
The system indexes all objects using a
As objects are created, moved, or deleted, the quad-tree is updated incrementally. For very dense boards, an R-tree variant may be substituted because R-trees handle overlapping bounding boxes more efficiently. The choice depends on object density and query patterns.
Viewport virtualization and lazy loading#
On the client side,
This is analogous to how mapping applications like Google Maps load tile images. The key difference is that Miro’s tiles contain structured object data, not pre-rendered images, because objects must remain interactive.
Attention: Spatial partitioning introduces a subtle sync challenge. If User A is viewing Region 1 and User B moves an object from Region 2 into Region 1, User A must receive that object even though it was not previously in their subscription set. The sync engine must handle cross-region object migration gracefully.
Lazy loading and spatial indexing together cap the per-client resource cost regardless of total board size. A user zoomed into one corner of a massive board uses roughly the same bandwidth and memory as a user on a small board. This architectural property is what makes “infinite” canvas viable at scale.
With the data model and spatial layer in place, the next challenge is the hardest one: keeping every client’s view of the board consistent in real time.
Real-time collaboration engine#
Real-time collaboration is the most architecturally demanding subsystem in Miro system design. It must ingest high-frequency edits from many clients, order them, resolve conflicts, and broadcast the results, all within a latency budget measured in tens of milliseconds.
Connection model#
Clients establish persistent
- Authenticates and receives a session token.
- Opens a WebSocket to a collaboration server assigned to that board.
- Receives the initial board state (filtered by viewport).
- Begins streaming local operations upstream and receiving remote operations downstream.
The collaboration server maintains an in-memory representation of the active board. This is the “hot” copy. It absorbs incoming operations, applies conflict resolution, appends to the operation log, and fans out the resolved operations to all connected clients.
Operation-based sync model#
Rather than transmitting full state snapshots, the system uses an operation-based model. Each user action is encoded as a small, typed operation:
move(object_id, {x: 120, y: 340})update_style(object_id, {fill: “#FF0”})delete(object_id)create(object_type, properties)
Operations are
Real-world context: Miro’s engineering team has discussed using a hybrid approach where high-frequency positional updates (like continuous dragging) are batched and throttled at 30 to 60Hz, while discrete mutations (create, delete, style change) are sent immediately. This balances bandwidth with responsiveness.
The following table compares operation-based sync against state-based alternatives:
Synchronization Strategy Comparison
Sync Strategy | Advantages | Disadvantages |
Operation-based Sync | Small payloads; supports fine-grained conflict resolution | Requires ordered delivery; needs idempotency guarantees |
State Snapshot Sync | Simpler to implement; guaranteed convergence on full refresh | High bandwidth usage; poor latency for large datasets |
Delta-state Sync | Compact diffs; effective for periodic reconciliation | Complex diff computation; requires sophisticated merge logic |
Operation-based sync dominates in collaborative canvas systems because the object graph changes incrementally and frequently. Full state transfers are reserved for initial board load and post-disconnection resync.
But what happens when two users edit the same object at the same instant? That is where the conflict resolution layer earns its complexity budget.
Conflict resolution and operation merging#
Concurrency in Miro is object-centric. If two users edit different objects, their operations commute naturally and can be applied in any order. The hard case is when two users mutate the same object, or worse, the same property of the same object, simultaneously.
CRDTs vs. Operational Transformation#
The two dominant strategies for collaborative conflict resolution are Operational Transformation (OT) and Conflict-free Replicated Data Types (CRDTs).
Operational Transformation was pioneered by Google Docs. It works by transforming incoming operations against previously applied concurrent operations so that all clients converge to the same state. OT is well-suited to sequential text editing but becomes combinatorially complex for arbitrary object graphs. Each new operation type requires a custom transformation function against every other operation type.
CRDTs take a different approach. They define data structures whose merge operations are mathematically guaranteed to be commutative, associative, and idempotent. As long as every client eventually receives every operation, convergence is automatic regardless of ordering. CRDTs are increasingly favored in spatial collaborative systems because they handle concurrent edits to independent properties (position, style, content) without custom transform functions.
OT vs. CRDTs: Key Dimension Comparison
Dimension | Operational Transformation (OT) | Conflict-free Replicated Data Types (CRDTs) |
Convergence Guarantee | Requires correct transformation functions and a central authority or deterministic ordering | Guaranteed automatically via algebraic properties; no central authority needed |
Complexity Scaling | Grows quadratically O(n²) with operation type pairs | Grows linearly with the introduction of new data types |
Latency | Requires a central sequencer, adding a server round-trip delay | Enables local-first operation; changes merge deterministically on reconnection |
Suitability for Spatial Objects | Moderate ā demands extensive transformation logic for spatial handling | Strong ā naturally decomposes by property, minimizing additional logic |
Field-level merge strategy#
In practice, Miro-like systems use a field-level merge approach. An object is treated not as an atomic unit but as a map of independently mergeable fields. Two users can simultaneously change the fill color and the position of the same sticky note without conflict because the changes affect different fields.
When two users change the same field of the same object, the system falls back to a deterministic tiebreaker. The most common strategy is
Pro tip: In an interview, do not just say “last-writer-wins.” Explain that LWW is applied per field, not per object, and that the timestamp is a hybrid logical clock (combining physical time and a logical counter) to avoid issues with clock skew across clients.
For richer content like text inside a sticky note, the system may use a text-specific CRDT (such as an RGA or Yjs-style sequence CRDT) to allow character-level concurrent editing. This is a targeted escalation in complexity, applied only where LWW would produce unacceptable data loss.
Conflict resolution ensures convergence, but it does not help if operations arrive out of order or are duplicated during network retries. That brings us to the sync protocol that ensures reliable, ordered delivery.
Sync protocol and client communication#
The sync protocol is the transport layer that binds the collaboration engine to every connected client. Its design directly determines perceived latency, bandwidth consumption, and resilience to network instability.
Upstream and downstream channels#
The protocol operates as two logical channels over a single WebSocket connection:
- Upstream (client to server): The client sends locally generated operations. Each operation carries a client-generated sequence number and a logical timestamp.
- Downstream (server to client): The server broadcasts resolved operations to all clients subscribed to the affected board region. Each operation carries a server-assigned global sequence number.
When a client sends an operation, it applies the change locally immediately (optimistic local application) and marks it as “pending.” When the server acknowledges the operation, the client promotes it to “confirmed.” If the server rejects or transforms it, the client rolls back and reapplies.
Reconnection and catch-up#
Clients inevitably disconnect. The protocol must handle reconnection gracefully. When a client reconnects, it sends its last confirmed server sequence number. The server replays all operations after that sequence from the operation log, bringing the client back into sync without a full board reload.
If the gap is too large (e.g., the client was offline for hours and the operation log has been compacted), the server falls back to sending a full snapshot plus a shorter operation tail.
Attention: Optimistic local application creates a visual inconsistency window. The client shows its own edit immediately, but the server may reorder or transform it. The protocol must handle rollback smoothly to avoid jarring visual glitches. Techniques include buffering pending operations and rebasing them against incoming server state.
For very high-frequency updates like continuous dragging, the protocol applies backpressure. The client throttles upstream sends to a fixed rate (e.g., 30 operations per second) and interpolates intermediate positions locally. The server does not need every pixel of a drag path. It needs the intent and the final position.
This protocol handles durable board state, but there is another category of data that flows even faster and tolerates much more loss. That is presence.
Presence, cursors, and awareness#
Presence is what makes Miro feel alive. You see collaborators’ cursors gliding across the canvas, their selection highlights, their viewport indicators in the minimap. Remove presence and the board feels like working alone with a save-and-refresh cycle.
Presence data includes cursor position, selected objects, current viewport bounds, and user identity. It updates at high frequency (10 to 30Hz per user) and is entirely ephemeral. If a presence packet is lost, nothing breaks. The next packet overwrites it anyway.
This is why presence is architecturally separated from board state:
- Board operations flow through the conflict resolution pipeline, are durably logged, and must be delivered reliably and in order.
- Presence updates are fire-and-forget broadcasts with no ordering guarantees, no persistence, and no conflict resolution.
The presence service typically uses an in-memory pub/sub system (such as Redis Pub/Sub or a purpose-built broker) scoped to the board or viewport region. Updates are broadcast to all subscribers and immediately discarded by the broker.
Real-world context: At scale, presence traffic can exceed board operation traffic by 10x or more. Separating it prevents cursor updates from creating backpressure on the durable operation pipeline, which would directly degrade collaboration latency.
Clients render remote cursors using interpolation to smooth out the inherent jitter of network delivery. Even if presence updates arrive at 15Hz, the client interpolates cursor positions at 60fps to create the illusion of fluid motion.
Presence makes collaboration feel real, but trust in that collaboration depends on something deeper: the ability to look back in time. That is the role of versioning.
Versioning and board history#
Version history provides the safety net that makes fearless collaboration possible. Without it, a single accidental bulk-delete could destroy hours of workshop output with no recourse.
Snapshot and operation log architecture#
The versioning system uses a dual strategy:
- Operation log: Every confirmed operation is appended to a durable, ordered log (similar to a write-ahead log in databases). This log is the source of truth for fine-grained history.
- Periodic snapshots: At regular intervals (e.g., every 1,000 operations or every 5 minutes of activity), the system captures a full snapshot of the board state. Snapshots are stored in object storage (e.g., Amazon S3).
To reconstruct the board at any point in time, the system loads the nearest preceding snapshot and replays operations from the log up to the target timestamp. This is analogous to how databases use checkpoints plus write-ahead logs for crash recovery.
The cost of a snapshot is proportional to the board size ($O(n)$ where $n$ is the object count), while the cost of replaying operations is proportional to the number of operations since the last snapshot ($O(k)$). The snapshot interval controls the trade-off between storage cost and replay latency:
$$T{recovery} = T{snapshot_load} + k \\cdot T_{op_replay}$$
Historical note: This snapshot-plus-log architecture is directly borrowed from event sourcing patterns in distributed systems. Apache Kafka’s log compaction and database point-in-time recovery use the same fundamental approach.
Operation log compaction#
The operation log grows indefinitely without intervention. The system compacts it by discarding operations that precede the oldest retained snapshot. For boards with very long histories, older snapshots may be downsampled (e.g., keeping one per day instead of one per 5 minutes) to bound storage growth.
Versioning feeds directly into undo/redo, which has its own unique challenges in a collaborative context.
Undo and redo semantics#
Undo in a single-user application is trivial: pop the last action off a stack and apply its inverse. Undo in a multi-user system is anything but trivial.
A user expects “undo” to reverse their last action, not the most recent action globally. If User A moves a sticky note and User B changes its color, User A’s undo should reverse the move, not the color change. This requires per-user operation stacks.
The system maintains a per-user undo stack on the client side. When the user triggers undo, the client computes the inverse operation and sends it to the server as a normal operation. The server processes it through the same conflict resolution pipeline.
This means undo is not a “rewind.” It is a new forward operation that happens to reverse a previous effect. This design has important properties:
- Undo operations are visible to all collaborators.
- Undo operations can themselves conflict with concurrent edits and must be resolved normally.
- Redo is simply undoing the undo (pushing the original operation again).
Attention: If User A moves an object and User B deletes it before User A can undo, the undo operation targets a non-existent object. The system must handle this gracefully, either by silently dropping the undo, by restoring the deleted object, or by notifying the user. Each choice has UX implications that should be discussed in an interview.
Clean undo semantics are a hallmark of polished collaborative systems. But even perfect undo cannot help if unauthorized users are making changes in the first place.
Sharing and access control#
Boards can be shared with fine-grained permissions: view-only, comment-only, or full edit access. Permissions can be scoped to individual users, teams, or made public via link. Crucially, permissions can change while a session is active.
Access control must be enforced server-side on every incoming operation. The collaboration server checks the user’s permission level before applying any mutation. This check must be extremely fast because it sits in the critical path of every edit.
The access control service maintains a permissions table indexed by (board_id, user_id) with a role enum. For link-shared boards, the system checks the link token’s associated role. Permission checks are cached in memory on the collaboration server and invalidated via pub/sub when permissions change.
When a permission is downgraded (e.g., edit revoked to view-only) during an active session, the server must:
- Update the cached permission.
- Reject subsequent edit operations from that user.
- Optionally notify the client to switch to a read-only UI.
Pro tip: In an interview, highlight that access control is isolated from the collaboration engine. The collaboration engine calls the access control service as a dependency, not the other way around. This separation allows each to scale independently and prevents permission logic from complicating the sync pipeline.
Security and permissions protect the board’s integrity, but performance optimization protects the user’s experience. Let’s examine how the system stays fast under load.
Rendering pipeline and client-side performance#
While this blog focuses on backend architecture, the rendering pipeline deserves attention because client-side performance directly affects perceived latency. A perfectly fast server is meaningless if the client cannot render updates at 60fps.
Modern collaborative canvas applications typically render using the Canvas 2D API or WebGL rather than DOM-based SVG. The reason is performance at scale. SVG elements are DOM nodes, and browsers struggle to manage tens of thousands of DOM nodes efficiently. Canvas 2D and WebGL bypass the DOM entirely, drawing directly to a GPU-backed bitmap.
The rendering loop operates on a tight cycle:
- Check for new remote operations and local input events.
- Update the in-memory object graph.
- Determine which objects intersect the current viewport (spatial query against the local quad-tree).
- Draw only those objects to the canvas.
- Overlay presence data (cursors, selections) on top.
Viewport virtualization ensures that step 3 is always proportional to the number of visible objects, not the total board size. This is what keeps frame times constant on a board with 50 objects or 50,000.
Real-world context: Miro’s engineering team has discussed targeting 60fps for all interactions and gracefully degrading to 30fps on boards with extreme object density. Hardware-accelerated WebGL rendering is used for complex visual effects like shadows and gradients, while simpler elements use Canvas 2D for lower overhead.
Client-side caching also plays a major role. The client retains the full object state for the current viewport plus a buffer zone around it. Small pan movements are served entirely from the local cache. Only when the user navigates beyond the buffer does the client request new data from the server.
Rendering keeps the user happy. But the system must also handle the unhappy paths (failures, crashes, and degraded networks) without losing data.
Failure handling and resilience#
Failures in real-time systems are not exceptional events. They are constant background noise. Clients disconnect mid-operation. Collaboration servers crash under load spikes. Network latency spikes during cross-region traffic. The system must treat all of these as normal operating conditions.
The resilience strategy rests on three principles:
- Durable acknowledgment: The server does not acknowledge a client operation until it is persisted to the operation log. If the server crashes before acknowledging, the client retransmits.
- Idempotent operations: Because operations carry unique IDs, duplicate transmissions (from retries) are deduplicated server-side. Applying the same operation twice has no additional effect.
- Graceful degradation: If the real-time sync layer degrades, the system falls back to periodic polling or deferred sync. Users may temporarily lose live collaboration, but their individual edits are buffered locally and reconciled when connectivity restores.
Collaboration servers are stateful (they hold the hot board copy in memory), which makes failover more complex than for stateless services. The system mitigates this by:
- Replicating the operation log to a durable store (e.g., Apache Kafka or Amazon DynamoDB Streams) so that a replacement server can rebuild the hot state.
- Using consistent hashing to assign boards to servers, so a server failure only affects the boards it was hosting.
- Running standby servers that can warm up from the operation log within seconds.
Attention: Stateful collaboration servers are the biggest single point of failure risk in this architecture. The failover time (how long it takes a standby to become hot) directly determines the collaboration outage window. Targeting sub-10-second failover requires aggressive log replication and prewarming strategies.
Resilience handles individual failures. Scaling handles aggregate growth. Let’s examine how the system grows to support large boards and large teams.
Scaling to large boards and teams#
Scaling Miro system design involves three distinct axes, and each requires different strategies.
Board size scaling (object count). As boards grow past 10,000 objects, the quad-tree depth increases, snapshot sizes grow, and operation logs lengthen. The system addresses this with spatial sharding: a very large board can be split across multiple collaboration server partitions, each responsible for a region of the canvas. Cross-partition object moves are handled as delete-in-source, create-in-destination atomic pairs.
Concurrency scaling (user count per board). A single collaboration server can typically handle 50 to 200 concurrent WebSocket connections for one board. Beyond that, the system deploys multiple collaboration servers for the same board, with a coordination layer (backed by a shared operation log) ensuring they remain consistent.
Organizational scaling (total boards and users). This is more conventional horizontal scaling. Boards are distributed across collaboration server clusters using consistent hashing. Metadata services (boards list, workspace info, permissions) scale independently using standard database replication and caching patterns.
Pro tip: In an interview, distinguish between these scaling axes explicitly. Saying “we add more servers” is vague. Saying “we shard spatially for large boards, fan out to multiple sync servers for high concurrency, and distribute boards across clusters for organizational growth” demonstrates precise architectural thinking.
Hot boards (those with sudden spikes in activity, like a workshop starting) need elastic resource allocation. The system monitors WebSocket connection counts and operation throughput per board, triggering scale-up when thresholds are crossed. This is analogous to hot partition mitigation in distributed databases.
All of these scaling strategies ultimately serve one goal: maintaining user trust, even at scale.
Data integrity and user trust#
Trust is the product. Users trust that their sticky notes will not vanish, that concurrent edits will merge sensibly, and that the board represents a single shared reality. Every architectural decision in this system exists to maintain that trust.
The trust contract has three pillars:
- Convergence: All clients viewing the same board must eventually see the same state. Divergence is tolerable for milliseconds, not minutes.
- Durability: Once the server acknowledges an operation, that operation survives any single infrastructure failure.
- Predictability: Conflict resolution must be deterministic. Users should be able to reason about what will happen when two people edit simultaneously.
The system uses checksums and periodic reconciliation to detect divergence. If a client’s local state hash diverges from the server’s expected hash after applying the same operation sequence, the server triggers a forced resync (full snapshot delivery) to correct the client. This is a safety net, not a normal code path.
Real-world context: Miro boards often represent days of workshop output, product roadmaps, or architectural plans. A data loss event does not just lose bits. It loses trust, and trust is much harder to rebuild than data. This is why the system errs on the side of over-persisting and over-replicating.
With the full architecture covered, let’s examine how interviewers probe this design and what separates strong answers from weak ones.
How interviewers evaluate Miro system design#
Miro is an increasingly popular system design interview question because it tests multiple advanced topics simultaneously: real-time collaboration, spatial data structures, distributed conflict resolution, and performance-sensitive client-server communication.
Interviewers typically evaluate across these dimensions:
- Modeling clarity: Can you define the board data model with object-level granularity and explain why?
- Conflict resolution depth: Do you name specific strategies (CRDTs, OT, LWW) and articulate trade-offs, or do you hand-wave with “we handle conflicts”?
- Sync protocol understanding: Can you explain optimistic local application, server reordering, and reconnection catch-up?
- Spatial awareness: Do you address viewport-based loading, spatial indexing, and the infinite canvas illusion?
- Non-functional reasoning: Do you explicitly discuss latency budgets, consistency spectrums, and failure modes?
The strongest candidates draw a clear boundary between ephemeral data (presence) and durable data (board state), explain per-field conflict resolution rather than per-object, and address scaling across all three axes (board size, concurrency, organization).
A common pitfall is spending too long on the “happy path” and not enough on failure handling, reconnection, and consistency guarantees. Interviewers want to see that you design for the real world, where networks are unreliable and users are impatient.
Pro tip: Structure your interview answer around the six subsystems outlined in this blog. Start with the data model, then the sync engine, then conflict resolution, then versioning, and finally scaling and failure handling. This gives the interviewer a clear mental map and shows you can decompose complexity systematically.
Final thoughts#
Miro system design sits at the intersection of three hard problems: real-time distributed collaboration, spatial data management at scale, and performance-critical client rendering. A strong design recognizes that these problems require different consistency models, different scaling strategies, and different failure tolerances, unified under a single coherent architecture.
The most critical takeaways are object-level granularity as the foundation of the data model, operation-based sync with CRDT or OT-backed conflict resolution for deterministic convergence, and the strict separation of ephemeral presence from durable board state. Get these three right and the rest of the system has a solid foundation to build on.
Looking ahead, the frontier is moving toward local-first architectures where clients can operate fully offline and merge seamlessly on reconnection, powered by next-generation CRDTs like Automerge and Yjs. AI-assisted canvas features (auto-layout, smart grouping, content generation) will add new categories of operations that must flow through the same sync and conflict resolution pipelines. The systems that win will be those designed with enough architectural flexibility to absorb these new capabilities without a rewrite.
If you can walk an interviewer through how an infinite canvas stays responsive, consistent, and resilient under heavy concurrent collaboration, you are demonstrating exactly the kind of system-level judgment that modern engineering teams need.