Table of Contents
Why Atlassian system design interviews feel differentPermissions and access control as system architectureThe permission hierarchyThe hard problem: invalidation under organizational changeAuthentication, authorization, and API gateway designReal-time collaboration and consistency expectationsWhy CRDTs fit Atlassian’s modelConsistency is not one-size-fits-allThe collaboration pipelineWorkflow engines and Jira as a distributed state machineModeling workflows as declarative state machinesHandling workflow evolutionMarketplace extensibility without chaosIsolation boundaries as a design principleWhat interviewers are really testingTenant isolation, noisy-neighbor control, and billing boundariesDesigning for fairnessSQL vs. NoSQL and storage trade-offs in multi-tenant SaaSWhen SQL fitsWhen NoSQL fitsSearch, indexing, and permission-aware information retrievalIncremental indexing driven by event streamsHandling index lag gracefullyLogging, monitoring, and observability as a core concernWhat to monitor and whyAlerting philosophyMigration safety and zero-downtime evolutionThe expand-and-contract patternRollbacks are tested paths, not emergency scriptsHandling failures the Atlassian wayFailure as narrative, not checklistAdditional system design prompts and how to approach themDesign a real-time analytics platform for Atlassian toolsDesign a logging and monitoring system for SLA observabilityDesign a user authentication and authorization system with SSOMonolith vs. microservices at Atlassian scaleWhen modular monoliths make senseWhen microservices become necessaryThe key trade-offExample interview prompt in depth: Design Confluence’s real-time editorStart with constraints, not componentsWalk through the data flowAddress the hard edgesConclusion
Atlassian System Design Interview Questions

Atlassian System Design Interview Questions

Atlassian System Design interviews test your ability to design permission-aware, collaborative, multi-tenant systems that stay correct, reliable, and extensible at scale.

Mar 10, 2026
Share
editor-page-cover

Atlassian system design interviews test whether you can architect permission-aware, multi-tenant collaborative platforms that remain correct under concurrency and evolve safely over years. Unlike generic system design rounds, these interviews emphasize deeply nested access control, real-time collaboration with CRDTs, workflow engines modeled as distributed state machines, and extensibility that never compromises tenant isolation or enterprise reliability.

Key takeaways

  • Permissions are the substrate, not a feature: Every read, write, search result, and automation execution depends on fast, correct permission evaluation backed by versioned caches and prompt invalidation.
  • Real-time collaboration demands nuanced consistency: Editing requires conflict-free convergence through CRDTs, while search and notifications can tolerate eventual consistency as long as access control holds.
  • Workflow engines must separate validation from side effects: Jira-style transitions should be validated synchronously, but downstream actions like notifications and webhooks must be asynchronous and idempotent.
  • Multi-tenancy shapes every layer of the stack: Tenant-scoped rate limiting, quota enforcement, and observable metrics are non-negotiable for preventing noisy-neighbor degradation.
  • Safe evolution outweighs fast deployment: Expand-and-contract migrations, backward-compatible APIs, and rollback paths as a core capability protect long-lived enterprise data from surprises.


Most engineers walk into Atlassian interviews expecting a standard “design a CRUD app” prompt. They sketch boxes, draw arrows, mention a load balancer, and wait for the next question. Then they get hit with something like: “How do you ensure a deprovisioned user loses access across cached search results, queued notifications, and in-flight automation within seconds?” That question reveals the gap between designing systems that work and designing systems that behave correctly under the specific pressures Atlassian products face every day.

Why Atlassian system design interviews feel different#

Atlassian’s product suite, spanning Jira, Confluence, Bitbucket, Trello, and Compass, sits at the center of how modern teams plan, build, ship, and respond to incidents. These are not lightweight consumer apps. They are always-on, permission-heavy, multi-tenant SaaS platforms where a five-person startup and a 100,000-seat enterprise share the same underlying infrastructure.

That duality creates a design tension most candidates underestimate. The startup wants speed and simplicity. The enterprise demands governance, auditability, and blast-radius containment. Your architecture must stretch across both extremes without snapping.

Unlike social media feeds or e-commerce checkouts, Atlassian platforms assume:

  • Concurrent editing: Multiple users modify the same artifact simultaneously.
  • Rich permission hierarchies: Access rules change frequently and cascade unpredictably.
  • Automation chains: Workflow transitions trigger notifications, webhooks, indexing, and third-party app executions.
  • Long-lived data: Content must survive years of schema migrations and product evolution.
Real-world context: A single Jira project in a large enterprise can have thousands of custom fields, dozens of workflow states, and permission schemes that differ across issue types. This is not hypothetical complexity. It is daily operational reality.

This is precisely why Atlassian interview questions feel heavier than generic designs. Interviewers are not looking for flashy distributed system diagrams or academic CAP theorem recitations. They want to see whether you can reason about correctness under concurrency, gradual evolution under production load, and isolation guarantees that hold when things break.

The backbone of every Atlassian system is its permission model, and that is where we start.

Permissions and access control as system architecture#

Permissions at Atlassian are not a feature bolted onto the side. They are the substrate on which every operation executes. Every read, every write, every search result, every notification delivery, and every automation execution depends on correct, fast permission evaluation. Get this wrong and the system leaks data or blocks legitimate users.

The permission hierarchy#

Atlassian permission models are deeply hierarchical and polyadicinvolving multiple independent dimensions of access control (organization, site, product, project, issue) that intersect rather than simply nest. Access may be defined at the organization level, refined at the site or product level, overridden at the project or space level, and further constrained at the individual issue or page level. On top of this, group memberships, roles, identity provider synchronization, and conditional rules (such as “only the reporter can edit this field after transition”) all influence the final access decision.

In an interview, you should explain permission evaluation as a system, not as a checklist. A strong design separates two concerns:

  • Permission definition: The administrative act of setting rules. This changes relatively infrequently.
  • Permission evaluation: The runtime act of checking access. This happens on every single request.

This separation leads naturally to permission graphsdirected acyclic structures where nodes represent entities (users, groups, roles, resources) and edges represent grant or deny relationships, enabling efficient traversal for access decisions. These graphs support precomputed effective permissions and carefully invalidated caches.

The following diagram illustrates how permission evaluation flows from request to decision.

Loading D2 diagram...
Permission evaluation runtime architecture

The hard problem: invalidation under organizational change#

The hardest problem is not evaluating permissions. It is doing so quickly without leaking access when organizational structures change. When a group is removed from a project, when a user is deprovisioned via SSO, or when an identity provider sync revokes a role, cached permissions must be invalidated promptly and safely.

A strong interview answer discusses:

  • Permission versioning: Each permission change increments a version. Evaluation checks the version before trusting cached results.
  • Bulk invalidation: When a group changes, all members’ cached permissions are invalidated in a single pass rather than one by one.
  • Background recomputation: A worker process rebuilds effective permissions asynchronously, but the system falls back to authoritative evaluation during the recomputation window.
Attention: Aggressive caching improves latency but increases the risk of stale access grants. Atlassian favors correctness over micro-optimizations, especially on write paths. Never design a system where a deprovisioned user can perform writes because a cache has not yet expired.

This brings us to a broader question candidates often overlook: how authentication and authorization layers work together in enterprise SaaS.

Authentication, authorization, and API gateway design#

Competitors in this space frequently test candidates on SSO (single sign-on)an authentication scheme that allows users to log in once with a single set of credentials and gain access to multiple related but independent software systems without re-authenticating. flows, OAuth2 token scoping, and JWT validation at the API gateway level. Atlassian interviews are no exception.

A well-designed system places authentication at the edge. The API gateway validates JWTs, extracts tenant and user identity, and attaches them to every downstream request. Authorization, the permission evaluation described above, happens closer to the data layer where context (project, issue, space) is available.

Authentication vs. Authorization: Key Responsibilities Compared

Aspect

Authentication

Authorization

Enforcement Location

Edge of system (server/gateway level)

Service layer or within the application

Validation Focus

Verifies user identity (who you are)

Validates access rights and permissions (what you can do)

Caching Mechanisms

Short-lived tokens (e.g., JWTs)

Permission graphs or Access Control Lists (ACLs)

Failure Mode

`401 Unauthorized`

`403 Forbidden`

Explicitly articulating this separation in an interview signals that you understand enterprise-grade identity management, not just application-level role checks.

With permissions as the foundation, the next challenge is building real-time collaboration on top of them.

Real-time collaboration and consistency expectations#

Atlassian collaboration goes far beyond plain text editing. Confluence pages contain tables, macros, diagrams, and embedded content. Trello boards involve card movement, ordering, and checklist updates. Jira issues change state while users simultaneously comment, transition workflows, and trigger automation. These systems must feel instantaneous to users while remaining consistent across devices, regions, and reconnects.

Why CRDTs fit Atlassian’s model#

Atlassian typically favors CRDT (conflict-free replicated data type)a data structure that can be replicated across multiple nodes, updated independently and concurrently without coordination, and merged deterministically into a consistent state. approaches for rich collaborative editing. CRDTs support local-first editing, conflict-free merges, and eventual convergence without centralized locks. This matters because Atlassian cannot afford to serialize every keystroke through a single coordination point when millions of users are editing concurrently.

In an interview, explain both the benefits and the costs:

  • Benefits: Offline editing works naturally. Merges are deterministic. No central bottleneck.
  • Costs: Metadata growth (each character carries vector clock or logical timestamp information). Snapshot compaction is required to prevent unbounded memory growth. Rich content types like tables and macros require custom CRDT types, not just text sequences.
Historical note: Confluence’s collaboration architecture evolved from operational transformation (OT) toward CRDT-based models as the product scaled. OT requires a central server to transform operations in order, which becomes a bottleneck at Atlassian’s scale. CRDTs remove that constraint by making merge order-independent.

Consistency is not one-size-fits-all#

A critical maturity signal in interviews is recognizing that different subsystems require different consistency guarantees. Over-engineering consistency where it is unnecessary wastes resources and adds latency.

Subsystem Consistency Requirements

Subsystem

Consistency Model

Lag Tolerance

Key Requirement

Editing

Strong Convergence

Real-time (none)

Immediate consistency across instances

Search Indexing

Eventual

Seconds-level

Index reflects recent changes promptly

Notifications

Eventual

Minutes-level

Timely but flexible delivery

Permissions

Strict Correctness

None (zero tolerance)

No stale grants after write operations

Analytics

Eventual

Batch-level

Data processed in batches; delays acceptable

A strong answer sounds like: “Editing needs real-time convergence, but search and notifications can tolerate lag as long as permissions are enforced at query time.”

The collaboration pipeline#

Deltas from collaborative editing propagate via WebSockets to a collaboration gateway. The gateway routes changes to a merge engine that applies CRDT operations. The versioning service persists snapshots at configurable intervals, compacting CRDT metadata to control storage growth. A presence tracking system broadcasts cursor positions and active-user indicators.

Critically, permission checks occur at the gateway level before deltas are accepted. A user who loses edit access mid-session should not be able to push further changes, even if their WebSocket connection remains open.

Loading D2 diagram...
Real-time editing pipeline with permission-gated delta propagation

Pro tip: When discussing snapshot compaction in an interview, mention that compaction frequency is a trade-off. Frequent compaction reduces memory but increases write amplification. Infrequent compaction saves I/O but risks OOM (out-of-memory) conditions during long editing sessions with many collaborators.

Real-time editing produces a stream of state changes. Those changes often need to trigger downstream actions, which brings us to workflow engines.

Workflow engines and Jira as a distributed state machine#

Jira is best understood not as an issue tracker but as a programmable workflow engine. Each issue transitions through states, enforces field requirements at each transition, emits events on state change, and triggers automation rules. These workflows are customized per project, per issue type, and can be modified by administrators at any time.

Modeling workflows as declarative state machines#

In interviews, describe workflows as declarative state machines backed by an execution engine. Each workflow definition specifies:

  • States: The set of allowed statuses (e.g., Open, In Progress, In Review, Done).
  • Transitions: The valid moves between states, along with conditions, validators, and post-functions.
  • Guards: Preconditions that must be satisfied before a transition fires (e.g., “assignee must be set,” “all sub-tasks must be resolved”).

The execution engine validates transitions synchronously. This means the user gets an immediate success or failure response. However, side effects, such as sending notifications, updating search indexes, firing webhooks, and executing Marketplace app triggers, must be asynchronous and idempotentproducing the same result regardless of how many times an operation is executed, which is critical for safe retries after partial failures.

This separation is essential. If a webhook endpoint is slow or a notification service is temporarily down, the user’s transition should not hang or fail. The event is published to a durable message bus and processed independently.

Attention: A common interview pitfall is designing workflow transitions as synchronous chains of side effects. This creates cascading failure risk. If the email service is down, should that block a developer from moving a ticket to “Done”? Obviously not. Decouple validation from propagation.

Handling workflow evolution#

Workflows change. An administrator might add a new required field to a transition or remove a status entirely. The system must handle in-flight issues gracefully.

Strong candidates discuss:

  • Workflow versioning: Each issue records which workflow version it was created under. Transitions are evaluated against the version active at the time of the transition, not the latest version.
  • Gradual migration: When a workflow changes, existing issues are migrated in batches with validation. Issues that cannot satisfy new constraints are flagged for manual resolution.
  • Rollback as a core operation: If a workflow change causes unexpected behavior, administrators can revert without data loss.

Loading D2 diagram...
Jira-like workflow state machine with event-driven architecture

The mention of Marketplace app triggers raises an important architectural challenge: how do you enable extensibility without letting third-party code compromise reliability?

Marketplace extensibility without chaos#

Atlassian’s Marketplace is both a competitive strength and an operational risk. Thousands of third-party apps listen to events, mutate workflows, inject UI components, and extend core functionality. The platform must remain safe, fast, and isolated even when extensions misbehave.

Isolation boundaries as a design principle#

In interviews, emphasize that Marketplace code should never execute inline with core workflows. The architecture enforces several isolation layers:

  • Event-driven consumption: Extensions receive events from a message bus, not direct function calls. This decouples execution timing.
  • Sandboxed execution: Third-party code runs in isolated environments (containers or serverless functions) with strict CPU, memory, and execution-time quotas.
  • Rate limiting per app per tenant: A misbehaving app in one tenant cannot exhaust shared resources or degrade other tenants’ experience.
  • Circuit breakers: If an app consistently fails or times out, the platform stops routing events to it and alerts the app developer.
Real-world context: Atlassian’s Forge platform enforces exactly these constraints. Apps run on Atlassian-managed infrastructure with strict invocation limits, eliminating the risk of customer data leaving controlled environments while preventing resource exhaustion.

What interviewers are really testing#

The question behind every extensibility prompt is: “Can you enable an ecosystem of third-party functionality without letting any single app compromise reliability or tenant isolation?” If your answer involves direct function calls or shared-process execution, you have missed the point.

The same tenant isolation concern applies far beyond the Marketplace, extending to every shared resource in the platform.

Tenant isolation, noisy-neighbor control, and billing boundaries#

Multi-tenancy is central to Atlassian’s cloud architecture. Each organization expects data isolation, fair resource allocation, and accurate billing. A “hot” tenant running a massive bulk import should not cause latency spikes for other tenants sharing the same infrastructure.

Designing for fairness#

Effective multi-tenant designs tag every request, event, background job, and metric with a tenant identifier. This enables:

  • Per-tenant quotas: Hard limits on API calls, storage, automation executions, and concurrent operations.
  • Budgeted resource pools: Background jobs like indexing and automation run on shared infrastructure but are scheduled with weighted fairness algorithms so no single tenant monopolizes compute.
  • Tenant-aware autoscaling: Monitoring systems track per-tenant resource consumption and trigger scaling decisions when specific tenants approach their allocation limits.

A strong interview answer sounds like: “Every request and event is tenant-scoped, rate-limited, and observable. If a tenant exceeds its budget, the system degrades gracefully for that tenant without affecting others.”

Pro tip: When discussing noisy-neighbor control, mention that billing accuracy depends on the same tenant-scoping infrastructure. If you can meter resource usage per tenant for isolation purposes, you can reuse those metrics for usage-based billing. This is not a coincidence. It is a deliberate architectural alignment.

Tenant isolation applies to data at rest as well. This naturally leads to questions about storage architecture.

SQL vs. NoSQL and storage trade-offs in multi-tenant SaaS#

Atlassian interviews frequently probe your reasoning about storage engine choices. This is not an abstract academic exercise. The choice between relational and non-relational storage has direct implications for permission enforcement, query flexibility, and tenant isolation.

When SQL fits#

Relational databases excel where:

  • Strong consistency matters: Permission definitions, workflow configurations, and billing records demand ACID guarantees.
  • Complex queries are common: Jira’s advanced search (JQL) involves joins across issues, projects, custom fields, and workflow states.
  • Schema enforcement prevents corruption: Enterprise data with strict validation rules benefits from relational constraints.

When NoSQL fits#

Non-relational stores make sense where:

  • Write throughput dominates: Audit logs, activity streams, and telemetry data arrive at high velocity with append-only patterns.
  • Schema flexibility is required: Marketplace app data varies wildly across apps and cannot conform to a single relational schema.
  • Horizontal scaling is non-negotiable: CRDT document storage and search indexes benefit from distributed, partition-tolerant architectures.

SQL vs NoSQL Database Comparison for Atlassian Use Cases

Use Case

Preferred Engine

Consistency Model

Scaling Strategy

Key Trade-Off

Permissions Management

SQL

Strong (ACID)

Vertical

Ensures data integrity & complex relationships, but limited horizontal scalability

Workflow Configuration

SQL

Strong (ACID)

Vertical

Structured schemas & transactional support, but struggles with rapidly changing data structures

Audit Logs

SQL

Strong (ACID)

Vertical

Reliable transaction logging, but may bottleneck under high write throughput

CRDT Storage

NoSQL

Eventual (BASE)

Horizontal

Handles concurrent updates efficiently, but risks temporary inconsistencies

Search Index

NoSQL

Eventual (BASE)

Horizontal

Optimized for full-text search & large-scale indexing, but requires extra mechanisms for data freshness

Attention: Interviewers are not looking for a blanket “use NoSQL for scale” answer. They want to see that you pick the right tool for each subsystem’s constraints. A system that stores permission definitions in a document database or CRDT snapshots in a normalized relational schema reveals a lack of architectural judgment.

In most mature Atlassian-scale systems, the answer is polyglot persistence: relational databases for transactional core data, document stores for flexible content, and specialized engines for search and analytics.

The mention of search brings us to one of the most deceptively difficult subsystems in Atlassian’s architecture.

Search, indexing, and permission-aware information retrieval#

Search at Atlassian scale is deceptively hard because it must satisfy three competing constraints simultaneously: it must be permissions-aware, near real-time, and multi-tenant. A user must never see a search result they cannot access, even if permissions changed moments ago.

Incremental indexing driven by event streams#

Strong designs use event-driven incremental indexing. Each content change, whether a Confluence page edit, a Jira issue transition, or a comment addition, emits an event to a durable stream. Index workers consume these events and update the search index asynchronously.

Permission filtering can be applied in two ways:

  • Index-time filtering: Bake permission metadata into the index document. When permissions change, reindex affected documents. This trades write amplification for fast query-time performance.
  • Query-time filtering: Apply permission checks at search time using cached permission sets. This avoids reindexing on permission changes but adds latency to every query.

Most production systems use a hybrid. Permission metadata is stored in the index for common cases (project-level access), while fine-grained checks (issue-level restrictions) are applied at query time.

Real-world context: Elasticsearch, which powers search in many Atlassian products, supports document-level security through role-based access filters. However, these filters must be kept in sync with the permission evaluation service. A stale filter is a data leak.

Handling index lag gracefully#

Index lag is inevitable in eventually consistent search systems. Rather than blocking interactions or showing stale data silently, well-designed systems surface subtle signals:

  • Newly created content appears immediately in the creator’s view (read-your-writes consistency) even if the index has not yet caught up.
  • Search results carry freshness indicators when lag exceeds a threshold.
  • Critical permission changes trigger synchronous index invalidation for affected documents, accepting higher latency for correctness.

Loading D2 diagram...
Permission-aware search indexing pipeline with hybrid filtering

Search is a read-heavy subsystem, but observability into its performance and correctness requires a dedicated monitoring strategy.

Logging, monitoring, and observability as a core concern#

Many candidates treat observability as an afterthought, something you mention in the last two minutes of an interview when asked about “what could go wrong.” At Atlassian, monitoring is a core architectural component because the platform’s SLA commitments depend on detecting degradation before customers do.

What to monitor and why#

An effective observability strategy covers three pillars:

  • Metrics: Quantitative measurements like request latency percentiles (p50, p95, p99), permission evaluation time, index lag, queue depth, and per-tenant resource consumption.
  • Logs: Structured event records for debugging. Every log entry should carry tenant ID, request ID, and user context for correlation.
  • Traces: Distributed traces that follow a request from the API gateway through permission evaluation, data retrieval, and response assembly. This is essential for diagnosing latency in multi-service architectures.

Alerting philosophy#

Alerts should fire on symptoms (elevated error rates, latency threshold breaches) rather than causes (high CPU). Cause-based alerts generate noise. Symptom-based alerts tell you that users are affected, which is what matters for SLA compliance.

Pro tip: In an interview, mention that per-tenant observability is critical for Atlassian’s model. If a single enterprise customer is experiencing degraded search performance due to a hot partition, global aggregate metrics might look fine. Per-tenant dashboards and alerts catch these cases.

$\\text{Alert threshold} = \\mu{\\text{latency}} + k \\cdot \\sigma{\\text{latency}}$

where $k$ is tuned based on SLA tolerance, typically $k = 3$ for p99 targets. This statistical approach avoids false positives from normal variance while catching genuine degradation.

Observability tells you when something is wrong. The next question is how you evolve the system safely when you need to fix or improve it.

Migration safety and zero-downtime evolution#

Atlassian systems evolve continuously, but customer data is long-lived. A Confluence instance may contain a decade of institutional knowledge. A Jira project may have hundreds of thousands of issues with custom fields accumulated over years. Migrations must be reversible, observable, and safe at scale.

The expand-and-contract pattern#

The standard approach for schema evolution at Atlassian scale is expand-and-contract migrationa two-phase schema change strategy where the new schema is deployed alongside the old one (expand), traffic is gradually shifted, and the old schema is removed only after validation (contract).

The process works as follows:

  1. Expand: Deploy the new column, table, or field alongside the existing one. Both old and new code paths coexist. New writes populate both locations.
  2. Migrate: Backfill existing data from old to new format. This runs incrementally with backpressurea flow control mechanism that slows down producers when consumers cannot keep up, preventing queue overflow and resource exhaustion. to avoid overwhelming the database.
  3. Validate: Verify that the new format is correct and complete. Run consistency checks comparing old and new representations.
  4. Contract: Remove the old column or code path. This only happens after confidence is established, often weeks later.
Historical note: Atlassian’s migration from Server to Cloud involved moving millions of customer instances to a multi-tenant architecture. The expand-and-contract pattern, combined with gradual traffic shifting and per-tenant rollback capability, was essential to executing this without data loss or prolonged outages.

Rollbacks are tested paths, not emergency scripts#

In interviews, explicitly state that rollback is a core operation. If a migration introduces a bug, the system must be able to revert without manual intervention. This means:

  • Old and new schemas coexist long enough to validate.
  • No destructive operations (dropping columns, deleting data) occur until the new path is proven.
  • Feature flags control which code path is active, enabling instant reversion.

This approach trades rollout velocity for predictability, a trade-off Atlassian explicitly embraces for enterprise customers.

The pattern of safe evolution also applies to how the system handles unexpected failures in production.

Handling failures the Atlassian way#

Failures are inevitable in collaborative systems operating at Atlassian’s scale. The question is never “will something fail?” but “how gracefully does the system degrade when it does?”

Failure as narrative, not checklist#

Instead of listing failure modes, Atlassian interviewers want you to narrate degradation scenarios. For example:

Search indexing lags behind. Users may see stale results for recently updated content. The system continues serving reads from the primary data store for the content detail view while the index catches up. A dashboard metric tracks index lag per tenant. If lag exceeds a threshold (e.g., 30 seconds), an alert fires and the system can temporarily redirect searches to a slower but authoritative query path.

A Marketplace app times out. The circuit breaker trips after three consecutive failures. Events for that app are queued with exponential backoff. The core workflow (issue transition, page save) is unaffected because app execution is asynchronous. The app developer receives a notification about the failure pattern.

Permission cache becomes stale after a network partition. The system falls back to authoritative permission evaluation from the database. Latency increases but correctness is preserved. Once the partition heals, the cache warms incrementally rather than all at once to avoid a thundering herd.

Real-world context: Atlassian’s Statuspage product exists precisely because transparent failure communication is a company value. When designing systems in an interview, showing that you think about how failures are surfaced to users and operators, not just how they are technically handled, demonstrates alignment with Atlassian’s engineering culture.

Loading D2 diagram...
Graceful degradation patterns across three failure scenarios

With a solid grasp of failure handling, let us look at some additional system design prompts that Atlassian interviews commonly explore.

Additional system design prompts and how to approach them#

Beyond the Confluence real-time editor prompt covered earlier, Atlassian interviews draw from several other design scenarios. Here is how to reason about three frequently asked ones.

Design a real-time analytics platform for Atlassian tools#

This prompt tests your ability to handle high-volume event ingestion, stream processing, and tenant-isolated aggregation. Key considerations:

  • Events (page views, issue transitions, build completions) arrive at high velocity and must be ingested into a durable stream like Apache Kafka.
  • Stream processors compute real-time aggregates (active users, trending projects, SLA compliance) per tenant.
  • Dashboards query a time-series store optimized for range queries and rollups.
  • Tenant isolation is critical. One tenant’s analytics load should not delay another’s dashboard rendering.

Design a logging and monitoring system for SLA observability#

This prompt tests whether you treat observability as architecture. Discuss structured log ingestion, log correlation via request IDs, metric aggregation at multiple granularities (per-request, per-tenant, per-service), and alerting pipelines that distinguish symptoms from causes.

Design a user authentication and authorization system with SSO#

This prompt bridges identity management with the permission model. Discuss IdP (identity provider) integration via SAML or OIDC, token issuance and validation at the API gateway, session management across products, and how deprovisioning events from the IdP trigger permission cache invalidation across all downstream services.

Pro tip: For any of these prompts, start by stating the non-functional requirements that matter most (consistency, latency, isolation, correctness) before jumping into component design. Atlassian interviewers care more about your prioritization of constraints than about the completeness of your box diagram.

Each of these prompts ultimately tests the same meta-skill: can you balance competing architectural forces in a system that must serve both small teams and massive enterprises?

Monolith vs. microservices at Atlassian scale#

Atlassian’s own architecture has evolved from monolithic applications (the original Jira Server) to a cloud-native, service-oriented platform. Interviewers sometimes probe your reasoning about this transition directly.

When modular monoliths make sense#

For early-stage products or bounded domains with tight coupling (e.g., a single workflow engine), a modular monolith reduces operational overhead. Deployments are simpler. Debugging is easier. Latency between components is negligible because communication is in-process.

When microservices become necessary#

As the system grows, service boundaries emerge naturally along tenant isolation lines, consistency domain boundaries, and team ownership areas. Microservices make sense when:

  • Independent scaling is required (search indexing scales differently from real-time editing).
  • Fault isolation matters (a bug in the notification service should not crash the workflow engine).
  • Teams need deployment autonomy.

The key trade-off#

Microservices introduce distributed system complexity: network partitions, serialization overhead, distributed tracing requirements, and eventual consistency between services. In an interview, acknowledge this trade-off explicitly rather than defaulting to “microservices are always better.”

TypeScript
// Consistency levels applied at each service boundary
enum Consistency {
STRONG = "strong", // synchronous, linearizable
EVENTUAL = "eventual", // async event-driven
CAUSAL = "causal", // ordered but not immediate
}
interface ServiceCall {
from: string;
to: string;
type: "sync" | "async"; // solid arrow = sync, dashed = async
consistency: Consistency;
description: string;
}
// Core service nodes in the dependency map
const services = [
"APIGateway",
"PermissionService",
"WorkflowEngine",
"CollaborationService",
"SearchIndexer",
"NotificationService",
"MarketplaceEventBus",
] as const;
type ServiceName = typeof services[number];
// Dependency edges: annotated with arrow type and consistency requirement
const dependencyMap: ServiceCall[] = [
// API Gateway synchronously enforces permissions on every inbound request
{
from: "APIGateway",
to: "PermissionService",
type: "sync",
consistency: Consistency.STRONG,
description: "Authz check before routing; must be linearizable",
},
// API Gateway synchronously triggers workflow transitions
{
from: "APIGateway",
to: "WorkflowEngine",
type: "sync",
consistency: Consistency.STRONG,
description: "State transition must complete before response",
},
// Workflow engine asynchronously notifies collaboration layer of state changes
{
from: "WorkflowEngine",
to: "CollaborationService",
type: "async",
consistency: Consistency.CAUSAL,
description: "Ordered delivery ensures edit conflicts are resolved in sequence",
},
// Collaboration edits fan out asynchronously to search indexer
{
from: "CollaborationService",
to: "SearchIndexer",
type: "async",
consistency: Consistency.EVENTUAL,
description: "Index lag acceptable; search reflects near-real-time state",
},
// Workflow engine publishes domain events to the marketplace bus
{
from: "WorkflowEngine",
to: "MarketplaceEventBus",
type: "async",
consistency: Consistency.EVENTUAL,
description: "Third-party app integrations tolerate eventual delivery",
},
// Marketplace bus fans out to notification service for user alerts
{
from: "MarketplaceEventBus",
to: "NotificationService",
type: "async",
consistency: Consistency.EVENTUAL,
description: "Notifications are best-effort; duplicates deduplicated client-side",
},
// Permission changes propagate asynchronously to collaboration service cache
{
from: "PermissionService",
to: "CollaborationService",
type: "async",
consistency: Consistency.CAUSAL,
description: "ACL cache invalidation must respect causal order of grants/revokes",
},
// Notification service synchronously reads permissions before dispatching
{
from: "NotificationService",
to: "PermissionService",
type: "sync",
consistency: Consistency.STRONG,
description: "Verify recipient still has access before sending sensitive content",
},
];
// Render a human-readable dependency report grouped by consistency level
function printDependencyReport(calls: ServiceCall[]): void {
const arrowSymbol = (type: "sync" | "async") => (type === "sync" ? "──►" : "- -►");
console.log("=== Atlassian-Style Service Dependency Map ===
");
for (const level of Object.values(Consistency)) {
const group = calls.filter((c) => c.consistency === level);
if (group.length === 0) continue;
console.log(`[${level.toUpperCase()} consistency]`);
for (const call of group) {
// Print each edge with its arrow style and annotation
console.log(
` ${call.from} ${arrowSymbol(call.type)} ${call.to}` +
` [${call.type}] // ${call.description}`
);
}
console.log();
}
}
// Validate no service references an unknown node
function validateMap(calls: ServiceCall[], knownServices: readonly ServiceName[]): void {
for (const call of calls) {
if (!knownServices.includes(call.from as ServiceName)) {
throw new Error(`Unknown source service: ${call.from}`);
}
if (!knownServices.includes(call.to as ServiceName)) {
throw new Error(`Unknown target service: ${call.to}`);
}
}
console.log("Dependency map validation passed.
");
}
validateMap(dependencyMap, services);
printDependencyReport(dependencyMap);

Whether you choose a monolith or microservices, the system needs to handle one more critical dimension: the example design prompt that ties everything together.

Example interview prompt in depth: Design Confluence’s real-time editor#

This is the most frequently referenced Atlassian system design prompt, and it synthesizes nearly every theme we have discussed. A strong answer weaves together collaboration, permissions, consistency, extensibility, and failure handling into a coherent narrative.

Start with constraints, not components#

Begin by stating what matters most:

  • Correctness: Two users editing the same paragraph must converge to a consistent document state.
  • Permission enforcement: A user who loses edit access mid-session must be blocked from pushing further changes.
  • Graceful degradation: If the collaboration service is temporarily unavailable, users should be able to continue editing locally and reconcile when connectivity returns.

Walk through the data flow#

  1. The user’s browser establishes a WebSocket connection to the collaboration gateway, which validates the JWT and checks edit permissions.
  2. Local edits are applied immediately to the user’s CRDT replica. Deltas are sent over the WebSocket.
  3. The merge engine receives deltas, applies CRDT merge rules, and broadcasts the merged state to all connected clients.
  4. Periodically, the versioning service snapshots the CRDT state, compacting metadata to control storage growth.
  5. The presence tracker broadcasts cursor positions and user activity indicators.
  6. On save or at intervals, an event is emitted to the indexing pipeline so search results reflect the latest content.

Address the hard edges#

  • Offline reconciliation: If a user edits offline and reconnects, their local CRDT state merges with the server state without conflicts. Discuss how vector clocks or Lamport timestamps enable this.
  • Snapshot compaction: Explain that CRDT metadata grows with every operation. Periodic compaction creates a clean baseline, discarding tombstones and redundant operation history.
  • Macro and table content: Rich content types require custom CRDT implementations beyond simple text sequences. Tables need row/column CRDTs. Macros may need to be treated as atomic blocks.
Attention: Do not present this as a simple “WebSocket plus database” design. Interviewers want to see that you understand why CRDTs are chosen over OT, where the costs appear (metadata growth, compaction overhead), and how permissions intersect with real-time state propagation.

This prompt, done well, demonstrates the full range of Atlassian system design thinking.

Conclusion#

Three themes dominate Atlassian system design interviews, and they are deeply interconnected. Permissions are the substrate: every operation depends on fast, correct access evaluation that degrades safely under organizational change. Collaboration demands nuanced consistency: real-time editing requires CRDTs and convergence guarantees, but not every subsystem needs the same rigor, and recognizing where eventual consistency is acceptable separates strong candidates from over-engineers. Safe evolution is non-negotiable: expand-and-contract migrations, rollbacks as a core capability, and backward-compatible APIs protect the long-lived enterprise data that Atlassian customers depend on for years.

Looking ahead, Atlassian’s shift toward AI-powered features (intelligent issue triaging, automated workflow suggestions, semantic search across products) will add new dimensions to system design interviews. Expect questions about ML pipeline integration, embedding-based retrieval, and how AI suggestions interact with the same permission and tenant isolation constraints discussed here.

Design for collaboration. Design for correctness. Design for the enterprise that will still be running your system five years from now. That is the Atlassian way.


Written By:
Zarish Khalid