Goldman Sachs System Design interview

Goldman Sachs System Design interview

This blog explains how to approach the Goldman Sachs system design interview by focusing on low latency, financial correctness, auditability, and resilience while clearly reasoning through trade-offs and failure scenarios.

17 mins read
Feb 10, 2026
Share
editor-page-cover

Preparing for the system design interview at Goldman Sachs means stepping into a domain where “mostly correct” is not a thing you’re allowed to ship. You’re designing systems that move money, route orders, consume market data, and calculate risk while markets are volatile and regulators expect a complete trail of what happened and why. The engineering bar is high because the consequences of mistakes are high: financial loss, operational risk, reputational damage, and regulatory exposure.

A lot of candidates approach Goldman like a typical large-scale tech interview: they talk about microservices, caches, and horizontal scaling. That’s necessary but not sufficient. What makes Goldman-specific questions different is the way they force you to balance latency, financial correctness, auditability, and resilience at the same time. You can’t optimize one dimension by hand-waving away the others. A low-latency trading path that can’t be audited is a non-starter.

A perfectly durable payment workflow that takes seconds to respond is also a non-starter. The interview is designed to see whether you can reason inside those constraints and still produce a design that behaves predictably when it’s under stress.

Interview heuristic:
At Goldman, a “good design” is one that remains explainable under failure. If you can’t describe how your system behaves when a feed lags, a service times out, or a region degrades, you haven’t finished the design.

This blog walks you through what Goldman interviewers evaluate, how to steer the conversation so it stays crisp and domain-appropriate, and how to answer the most common archetypes (trading, market data, risk, payments) with the level of specificity senior candidates are expected to show.

Cover
Grokking Modern System Design Interview

System Design Interviews decide your level and compensation at top tech companies. To succeed, you must design scalable systems, justify trade-offs, and explain decisions under time pressure. Most candidates struggle because they lack a repeatable method. Built by FAANG engineers, this is the definitive System Design Interview course. You will master distributed systems building blocks: databases, caches, load balancers, messaging, microservices, sharding, replication, and consistency, and learn the patterns behind web-scale architectures. Using the RESHADED framework, you will translate open-ended system design problems into precise requirements, explicit constraints, and success metrics, then design modular, reliable solutions. Full Mock Interview practice builds fluency and timing. By the end, you will discuss architectures with Staff-level clarity, tackle unseen questions with confidence, and stand out in System Design Interviews at leading companies.

26hrs
Intermediate
5 Playgrounds
26 Quizzes

What Goldman evaluates and why it feels different#

Goldman isn’t hiring you to design an app. They’re hiring you to design financial infrastructure. That means the evaluation criteria are not generic; they are tied to the realities of operating systems that interact with exchanges, broker-dealers, custodians, clearing houses, and internal control functions.

When an interviewer asks you to “design a trading system” or “design market data ingestion,” they’re not testing whether you remember a canonical architecture. They’re testing whether you can identify the dominant constraints and build around them: time-sensitive decision points, strict state transitions, and the need to prove what happened after the fact.

A useful way to organize your thinking is to map system areas to Goldman’s core concerns. You’ll reference this table implicitly throughout the interview when you justify trade-offs.

System area

Latency

Correctness

Auditability

Resilience

Order entry and validation (e.g., FIX gateway)

Very high

Very high

High

High

Pre-trade risk checks

High

Very high

High

High

Matching / execution path

Extremely high

Very high

Medium (via emitted events)

High

Market data ingestion and normalization

High

High

Medium

Very high

Post-trade processing (allocation, confirmation)

Medium

Very high

Very high

High

Settlement / payments / cash movement

Medium

Extremely high

Extremely high

Very high

Analytics and reporting

Medium/low

High

Very high

High

Notice the pattern: latency matters most in the execution path, but correctness and auditability never drop below “high.” Goldman’s systems have to be fast, but they also have to be provably correct and traceable.

What great looks like:
You consistently say what is time-critical, what is correctness-critical, and how you preserve auditability without turning everything into a slow synchronous pipeline.

The interview flow and how to steer it#

Most Goldman system design interviews are 45–60 minutes. The structure is familiar—requirements, architecture, deep dives, trade-offs, failures—but the emphasis is different. The interviewer will usually push hard on operational realism: What’s the p99 latency target? What happens when feeds drift? How do you reconcile? How do you control access?

widget

The best way to steer the interview is to keep two threads running in parallel:

  1. the functional story (orders, prices, risk checks, trades, settlement), and

  2. the control story (audit trail, permissions, failure modes, operational readiness).

If you only tell the functional story, your design looks like a demo. If you only tell the control story, your design looks like bureaucracy. Goldman expects you to do both and to explain how they interact.

At this point, the interviewer may ask something like: “What are your SLAs?” Don’t freeze. Give ranges, state your assumptions, and explain how those assumptions influence architecture. The interviewer wants to see you reason, not recite.

Interviewer prompt simulation:
“Where exactly do you draw the line between the low-latency execution path and the durable audit trail?”

A strong answer sounds like: you keep the execution path lean, but every state transition emits an immutable event that becomes the audit spine and the source of truth for downstream systems.

Two short pushback dialogues to practice#

Here’s the kind of pushback you should expect, and how to respond without sounding defensive.

Dialogue 1: market data lag#

Interviewer: “What breaks if the market data feed lags by 500 ms?”
You: “Two things: pricing decisions and risk decisions. I would treat feed freshness as a first-class signal and gate behavior based on it. If freshness drops below a threshold, the system degrades into a safer mode—tighter limits, widened spreads, or even trading halt for impacted symbols—while still keeping the audit trail intact.”
Interviewer: “So you’re willing to reject trades?”
You: “Yes. In finance, rejecting is safer than executing on stale data. The design needs explicit policies for that, and those policies should be observable and auditable.”

Dialogue 2: risk engine timeout#

Interviewer: “What if the risk engine times out? Do you fail open or fail closed?”
You: “Fail closed for anything that can create unbounded exposure. If we need availability, we introduce a bounded fallback: cached limits with strict TTL, plus a kill switch. The key is that fallback is measurable and explicitly logged so we can explain every decision later.”
Interviewer: “Isn’t that too conservative?”
You: “It’s intentionally conservative. Goldman optimizes for controlled risk, not maximal throughput at any cost.”

These exchanges show a Goldman-appropriate mindset: explicit policies, bounded fallbacks, and traceable decisions.

Back-of-the-envelope assumptions that make your design concrete#

Goldman interviewers don’t require exact numbers, but they do expect you to anchor your design with realistic ranges and show how those ranges drive choices. The mistake junior candidates make is to throw out “millions per second” without understanding the implications. The mistake mid-level candidates make is to avoid numbers entirely.

widget

You can use a small set of assumptions to ground the discussion. For example:

  • Order entry: from tens to hundreds of orders per second for many desks; potentially much higher for certain electronic flows.

  • Market data: thousands to hundreds of thousands of updates per second depending on venue coverage and instrument set.

  • Latency targets: p99 in the low milliseconds for many user-facing paths; tighter budgets for internal execution components depending on context.

  • Downstream consumers: risk, compliance, reporting, surveillance, PnL, and reconciliations—often many independent consumers of the same event stream.

The important move is not the numbers themselves. It’s how you connect them to architecture. High update rates push you toward streaming pipelines and efficient normalization. Tight latency pushes you toward in-memory structures and fewer synchronous hops. Auditability pushes you toward immutable event logs and deterministic state transitions.

Interview heuristic:
Numbers aren’t there to impress. They’re there to force design choices and expose what you’re trading away.

A real-time trade flow walkthrough (end-to-end)#

Goldman interviews often include trading because it compresses everything: latency, correctness, risk, and audit. The fastest way to demonstrate seniority is to walk through one trade end-to-end and narrate where you spend latency budget and where you refuse to compromise correctness.

Imagine an order enters the system through a FIX (Financial Information eXchange) gateway. The first responsibility is validation: schema validation, entitlement checks (is this user allowed to trade this product?), basic sanity checks (quantity bounds), and deduplication of retransmits. This is a common place where naive candidates do too much synchronously. You keep this stage fast and deterministic: accept or reject quickly, and emit an event capturing the decision.

Next comes pre-trade risk. The system evaluates limits: credit limits, position limits, concentration limits, and potentially real-time risk checks for specific products. This stage is correctness-critical. It must be consistent with the firm’s risk policy and must leave a trail. Strong candidates talk about the risk check as a decision service with explicit timeouts, explicit fallbacks, and explicit logging. Weak candidates say “call the risk service” and move on.

If the order passes risk, it enters the execution path: routing to a venue or internal matching engine depending on the business context. Latency is most sensitive here. If you’re discussing a matching engine, you describe an in-memory order book keyed by symbol with price-time priority. You talk about single-writer per instrument partitions to avoid locks, or careful concurrency control to preserve determinism. You also describe the output: executions produce trade events that are immutable facts, not “updates.”

Once a trade event exists, everything else becomes downstream consumption. Risk recalculations, PnL updates, compliance surveillance, confirmations, and settlement workflows consume the same event spine. This is where streaming platforms fit: they decouple producers from consumers and provide replay for recovery. But you don’t name Kafka (or any tool) as a buzzword; you name it as an “event spine” because replay and ordered consumption are operational requirements in finance.

Finally, audit logging is not a separate afterthought. Audit is the trail of decision points: validation results, risk decision outputs, routing decisions, and execution results. The best way to talk about this is that you capture immutable events and store them with retention and integrity guarantees. The audit system is not the execution path, but it is fed by the execution path.

What great looks like:
You identify the commit points: when an order becomes real, when a risk decision becomes binding, when a trade becomes an immutable fact, and how those facts can be replayed and reconciled.

The system archetypes you’ll see and how to answer them#

Goldman questions often fall into a few archetypes. The trick is to answer them in a Goldman-specific way: emphasize financial correctness, controlled risk, and operational readiness. For each archetype, it helps to contrast what a naive candidate says with how a strong candidate reframes.

widget

Trading and matching systems#

A naive candidate says: “We’ll build a matching engine, store orders in a database, and use a queue for events.” The failure is that storing every order mutation in a database synchronously destroys latency and creates contention. It also doesn’t address determinism: in finance, “which order got filled first” must be defensible.

A strong candidate reframes: execution is an in-memory, deterministic state machine with strict ordering per instrument. Durability and audit are handled through immutable event emission and snapshotting, not by turning the matching loop into a transactional DB workload. You also explicitly define where you can tolerate eventual consistency (downstream analytics) and where you cannot (execution state).

Common pitfall:
Treating the matching engine like a CRUD app. Execution is a state machine with strict ordering, not a set of independent updates.

Market data ingestion pipelines#

A naive candidate says: “We’ll ingest market data and store it in a database for consumers.” The failure is that market data is high-volume, bursty, and often out-of-order across venues. If you attempt to durably store all raw ticks synchronously, you create backpressure and lag at the worst possible time: market volatility.

A strong candidate reframes: you normalize feeds into a canonical schema, attach sequence metadata, deduplicate, and publish into a streaming layer that supports fan-out and replay. You explicitly separate fast path (fresh quotes for trading) from cold path (historical storage for analytics). You also define correctness in terms of freshness and ordering semantics, not “every message stored.”

Risk engines and real-time monitoring#

A naive candidate says: “We’ll compute VaR.” VaR (Value at Risk) is an important concept, but dropping the acronym without explaining the latency and correctness implications doesn’t help. Risk systems often involve a mix of real-time checks (pre-trade limits) and heavier computations (portfolio risk aggregation) with different time tolerances.

A strong candidate reframes: pre-trade risk is a low-latency policy decision with bounded timeouts and clear fallback rules. Post-trade risk monitoring is often streaming and eventually consistent, but it must be accurate and reconcilable. You talk about freshness, caching of reference data, snapshotting, and how to recover when inputs are corrupted.

Payments and money movement#

A naive candidate says: “We process payments with a database transaction.” The failure is that payments are workflows, not single transactions. They involve external systems, partial failures, retries, and strict audit requirements. You need idempotency, reconciliation, and an immutable ledger-like record.

A strong candidate reframes: money movement is a state machine with explicit transitions, idempotency keys, and a durable audit trail. You design for reversals, chargebacks, and delayed confirmations. You also treat security and entitlements as first-class constraints: who can initiate, approve, and reconcile.

Consistency in finance: what must be ACID vs what can be eventually consistent#

You will score points at Goldman by being explicit about consistency. ACID (Atomicity, Consistency, Isolation, Durability) is not a buzzword here; it’s a tool you apply selectively.

Execution state and booked trades are the kinds of records that often demand strong guarantees. If a trade is “booked,” you need to know it is durable and cannot disappear. Payment ledger entries similarly demand strong durability and auditability. In contrast, many analytics views—dashboards, aggregated risk metrics, monitoring panels—can be eventually consistent as long as they converge and you can prove their lineage.

The naive mistake is to declare “we’ll use ACID everywhere,” which usually translates into slow, fragile systems. The opposite naive mistake is to declare “eventual consistency is fine,” which is unacceptable for trade booking and money movement.

This is a Goldman-specific skill: drawing the boundary and defending it.

Interview heuristic:
Treat “eventual consistency” as a contract: you must say what can be stale, for how long, and how you detect and correct drift.

How to drive a latency-first conversation without sounding hand-wavy#

Goldman interviews often steer into latency because low latency is a real differentiator in finance. The trap is that candidates talk about “microseconds” without showing how latency budgets are managed. A latency-first conversation is not about bravado; it’s about clarity.

Start by decomposing latency into budgets: network hops, serialization, validation, risk check call, execution loop, and event publication. Then identify what you can do to reduce tail latency: fewer synchronous dependencies, in-memory hot paths, partitioning to avoid locks, and backpressure to prevent overload.

When you propose something that improves latency, always say what you risk. In-memory state risks loss on crash. You mitigate it with snapshotting and replay from an event log. Asynchronous publication risks delayed visibility downstream. You mitigate by defining what downstream systems can tolerate and by exposing freshness metrics.

That’s what makes the conversation credible: you trade performance for specific risks, then you mitigate those risks explicitly.

Data contracts, schemas, and versioning in market data and trading pipelines#

Senior candidates often overlook one of Goldman’s day-to-day realities: systems evolve. Feeds add fields. Risk models change. Regulatory reporting requirements expand. If you don’t design for schema evolution, your “perfect” architecture will break the first time a feed changes.

In market data, you receive heterogeneous messages from multiple venues. You normalize them into a canonical schema. That schema becomes a contract for consumers: trading, risk, analytics, surveillance. If you change it casually, you break the firm. In interviews, you don’t need to implement a schema registry, but you should describe the principle: versioned schemas, backward compatibility, and explicit migration paths.

In trading pipelines, the same applies to order and trade events. You want immutable events so you can replay and reconcile, but you also need to evolve event formats over time. A strong answer explains how consumers handle multiple versions, how you test schema changes, and how you roll out changes safely.

What great looks like:
You treat schemas as products: versioned, validated, and rolled out with compatibility guarantees, not as incidental JSON blobs.

Security and entitlements: permissions, segregation of duties, and audit trails#

Goldman is a financial institution, not just an engineering organization. That means access control is not optional; it is fundamental. Interviewers often look for whether you naturally include entitlements and audit.

Entitlements are about who can do what: who can trade which instruments, who can view sensitive positions, who can approve a large transfer, who can modify limits. Segregation of duties is about preventing a single actor from initiating and approving risky actions. Audit trails are about proving that controls existed and were applied.

The naive approach is to say “we’ll use authentication and RBAC.” That’s too shallow. A stronger approach explains that entitlements are enforced at decision points: order entry, limit changes, payment approvals, and data access. You also log entitlement decisions as part of the audit trail, because regulators and internal control teams will ask.

Failure modes that separate seniors from juniors#

Goldman interviews often pivot into failure handling because it’s where senior judgment shows up. It’s easy to design a happy path. It’s harder to design something that fails safely under stress.

  • Consider exchange disconnects. If you lose connectivity to a venue, you must decide what to do with outstanding orders: cancel, reroute, or hold. The correct answer depends on business context, but the senior move is to say that this is policy-driven and observable. You don’t silently “retry forever.” You expose venue health, trigger automated controls, and record decisions.

  • Consider out-of-order data. Market data can arrive out of sequence across feeds. If your system assumes ordering, you’ll compute wrong prices or trigger wrong risk signals. A senior answer discusses sequence numbers, event-time versus processing-time, and bounded reordering windows. You also describe how you detect corruption: sanity checks, cross-feed validation, and quarantine pipelines.

widget
  • Consider replay and reconciliation. In finance, replay is not a nice-to-have. If you can’t replay the event stream, you can’t recover cleanly after outages or prove what happened. A senior answer describes how the system can rebuild state from immutable events and how reconciliation compares derived states against authoritative records.

  • Consider partial outages. If a downstream risk analytics system is down, do you stop trading? Not necessarily. You separate pre-trade controls (must be available, fail closed with bounded fallback) from post-trade analytics (can lag, must recover). You define graceful degradation as “continue with bounded risk,” not “continue at any cost.”

Interview heuristic:
“Graceful degradation” in finance means you reduce activity to preserve safety and auditability, not that you keep everything running no matter what.

Trade-offs you must articulate (and how to talk about them)#

Goldman interviews reward candidates who can articulate trade-offs crisply: what you gain, what you risk, and how you mitigate. You can summarize the most important ones like this:

Latency vs durability

Faster execution path

Loss on crash

Snapshot + replay from immutable log

Strong consistency vs availability

Correct booking and limits

More rejects during outages

Bounded fallback + explicit policies

Streaming vs micro-batching

Lower latency, fresher signals

Higher operational complexity

Backpressure, observability, replay

Synchronous vs async propagation

Faster response times

Downstream lag

Freshness SLAs, consumer catch-up

Memory vs throughput

Faster access

Memory pressure, GC pauses

Partitioning, fixed-size structures, tuning

The table is useful, but what matters is how you narrate it. Two paragraphs of explanation beats ten bullets. When you bring up a trade-off, tie it back to Goldman’s priorities: correctness, auditability, and controlled risk. Then say exactly how you will measure the risk you introduced: freshness metrics, reconciliation jobs, and explicit SLOs (Service Level Objectives).

Operational readiness: runbooks, SLOs, and incident playbooks#

This is one of the easiest ways to stand out at Goldman because many candidates stop at architecture. Senior candidates talk about how the system is operated.

Operational readiness means you can answer: what do you monitor, what are your SLOs, and what is your plan at 3 a.m. when things go wrong? For latency-sensitive systems, you monitor p50/p95/p99 latency by stage (gateway, risk check, execution, publish). For data pipelines, you monitor lag, freshness, and drop rates. For financial correctness, you monitor reconciliation mismatches and idempotency collisions.

Runbooks matter because finance has strong operational controls. If a feed is corrupted, you need a playbook: quarantine the feed, switch to backup, widen controls, notify stakeholders. If risk checks time out, you need a playbook: fail closed, enable bounded fallback, escalate. In interviews, you don’t need to write the runbook; you need to show that you think in runbooks.

What great looks like:
You treat operations as part of the design: explicit SLOs, observable policies, and documented recovery steps.

A small, interview-friendly “how to answer” framework#

You don’t want to narrate a rigid checklist, but you do want a repeatable structure that keeps you calm. After you’ve explained the problem with a few paragraphs, you can use a tiny recap list to keep the conversation crisp.

Before you use bullets in an interview, you should have already explained your reasoning in prose. Then the bullets become a summary, not the content.

  • State the dominant constraints (latency, correctness, audit, resilience) and pick a scope.

  • Walk one end-to-end flow and name the commit points.

  • Stress-test the design with failure modes and define graceful degradation policies.

That’s it. Short, memorable, and Goldman-specific.

Final thoughts#

The Goldman Sachs system design interview is not a test of how many systems you’ve memorized. It’s a test of whether you can design financial infrastructure that behaves predictably under volatility, remains correct under failure, and can be audited after the fact. The strongest answers sound like a senior engineer who has lived through incidents: you talk about policies, limits, idempotency, replay, and operational controls as naturally as you talk about latency.

If you keep your conversation grounded in end-to-end flows, define commit points clearly, and treat audit and resilience as first-class design requirements—not add-ons—you’ll come across as the kind of engineer Goldman trusts with high-stakes systems.

Happy learning!


Written By:
Zarish Khalid