Goldman Sachs System Design interview

Table of Contents

What Goldman evaluates and why it feels different The interview flow and how to steer it Two short pushback dialogues to practice Dialogue 1: market data lag Dialogue 2: risk engine timeout Back-of-the-envelope assumptions that make your design concrete A real-time trade flow walkthrough (end-to-end)The system archetypes you’ll see and how to answer them Trading and matching systems Market data ingestion pipelines Risk engines and real-time monitoring Payments and money movement Consistency in finance: what must be ACID vs what can be eventually consistent How to drive a latency-first conversation without sounding hand-wavy Data contracts, schemas, and versioning in market data and trading pipelines Security and entitlements: permissions, segregation of duties, and audit trails Failure modes that separate seniors from juniors Trade-offs you must articulate (and how to talk about them)Operational readiness: runbooks, SLOs, and incident playbooks A small, interview-friendly “how to answer” framework Final thoughts

Home/

Blog/

This blog explains how to approach the Goldman Sachs system design interview by focusing on low latency, financial correctness, auditability, and resilience while clearly reasoning through trade-offs and failure scenarios.

17 mins read

Feb 10, 2026

Preparing for the system design interview at Goldman Sachs means stepping into a domain where “mostly correct” is not a thing you’re allowed to ship. You’re designing systems that move money, route orders, consume market data, and calculate risk while markets are volatile and regulators expect a complete trail of what happened and why. The engineering bar is high because the consequences of mistakes are high: financial loss, operational risk, reputational damage, and regulatory exposure.

A lot of candidates approach Goldman like a typical large-scale tech interview: they talk about microservices, caches, and horizontal scaling. That’s necessary but not sufficient. What makes Goldman-specific questions different is the way they force you to balance latency, financial correctness, auditability, and resilience at the same time. You can’t optimize one dimension by hand-waving away the others. A low-latency trading path that can’t be audited is a non-starter.

A perfectly durable payment workflow that takes seconds to respond is also a non-starter. The interview is designed to see whether you can reason inside those constraints and still produce a design that behaves predictably when it’s under stress.

Interview heuristic:
At Goldman, a “good design” is one that remains explainable under failure. If you can’t describe how your system behaves when a feed lags, a service times out, or a region degrades, you haven’t finished the design.

This blog walks you through what Goldman interviewers evaluate, how to steer the conversation so it stays crisp and domain-appropriate, and how to answer the most common archetypes (trading, market data, risk, payments) with the level of specificity senior candidates are expected to show.

Grokking Modern System Design Interview

For a decade, when developers talked about how to prepare for System Design Interviews, the answer was always Grokking System Design. This is that course — updated for the current tech landscape. As AI handles more of the routine work, engineers at every level are expected to operate with the architectural fluency that used to belong to Staff engineers. That's why System Design Interviews still determine starting level and compensation, and the bar keeps rising. I built this course from my experience building global-scale distributed systems at Microsoft and Meta — and from interviewing hundreds of candidates at both companies. The failure pattern I kept seeing wasn't a lack of technical knowledge. Even strong coders would hit a wall, because System Design Interviews don't test what you can build; they test whether you can reason through an ambiguous problem, communicate ideas clearly, and defend trade-offs in real time (all skills that matter ore than never now in the AI era). RESHADED is the framework I developed to fix that: a repeatable 45-minute roadmap through any open-ended System Design problem. The course covers the distributed systems fundamentals that appear in every interview – databases, caches, load balancers, CDNs, messaging queues, and more – then applies them across 13+ real-world case studies: YouTube, WhatsApp, Uber, Twitter, Google Maps, and modern systems like ChatGPT and AI/ML infrastructure. Then put your knowledge to the test with AI Mock Interviews designed to simulate the real interview experience. Hundreds of thousands of candidates have already used this course to land SWE, TPM, and EM roles at top companies. If you're serious about acing your next System Design Interview, this is the best place to start.

26hrs

Intermediate

5 Playgrounds

28 Quizzes

What Goldman evaluates and why it feels different#

Goldman isn’t hiring you to design an app. They’re hiring you to design financial infrastructure. That means the evaluation criteria are not generic; they are tied to the realities of operating systems that interact with exchanges, broker-dealers, custodians, clearing houses, and internal control functions.

When an interviewer asks you to “design a trading system” or “design market data ingestion,” they’re not testing whether you remember a canonical architecture. They’re testing whether you can identify the dominant constraints and build around them: time-sensitive decision points, strict state transitions, and the need to prove what happened after the fact.

A useful way to organize your thinking is to map system areas to Goldman’s core concerns. You’ll reference this table implicitly throughout the interview when you justify trade-offs.

System area	Latency	Correctness	Auditability	Resilience
Order entry and validation (e.g., FIX gateway)	Very high	Very high	High	High
Pre-trade risk checks	High	Very high	High	High
Matching / execution path	Extremely high	Very high	Medium (via emitted events)	High
Market data ingestion and normalization	High	High	Medium	Very high
Post-trade processing (allocation, confirmation)	Medium	Very high	Very high	High
Settlement / payments / cash movement	Medium	Extremely high	Extremely high	Very high
Analytics and reporting	Medium/low	High	Very high	High

Notice the pattern: latency matters most in the execution path, but correctness and auditability never drop below “high.” Goldman’s systems have to be fast, but they also have to be provably correct and traceable.

What great looks like:
You consistently say what is time-critical, what is correctness-critical, and how you preserve auditability without turning everything into a slow synchronous pipeline.

The interview flow and how to steer it#

Most Goldman system design interviews are 45–60 minutes. The structure is familiar—requirements, architecture, deep dives, trade-offs, failures—but the emphasis is different. The interviewer will usually push hard on operational realism: What’s the p99 latency target? What happens when feeds drift? How do you reconcile? How do you control access?

The best way to steer the interview is to keep two threads running in parallel:

the functional story (orders, prices, risk checks, trades, settlement), and
the control story (audit trail, permissions, failure modes, operational readiness).

If you only tell the functional story, your design looks like a demo. If you only tell the control story, your design looks like bureaucracy. Goldman expects you to do both and to explain how they interact.

At this point, the interviewer may ask something like: “What are your SLAs?” Don’t freeze. Give ranges, state your assumptions, and explain how those assumptions influence architecture. The interviewer wants to see you reason, not recite.

Interviewer prompt simulation:
“Where exactly do you draw the line between the low-latency execution path and the durable audit trail?”

A strong answer sounds like: you keep the execution path lean, but every state transition emits an immutable event that becomes the audit spine and the source of truth for downstream systems.

Two short pushback dialogues to practice#

Here’s the kind of pushback you should expect, and how to respond without sounding defensive.

Dialogue 1: market data lag#

Interviewer: “What breaks if the market data feed lags by 500 ms?”
You: “Two things: pricing decisions and risk decisions. I would treat feed freshness as a first-class signal and gate behavior based on it. If freshness drops below a threshold, the system degrades into a safer mode—tighter limits, widened spreads, or even trading halt for impacted symbols—while still keeping the audit trail intact.”
Interviewer: “So you’re willing to reject trades?”
You: “Yes. In finance, rejecting is safer than executing on stale data. The design needs explicit policies for that, and those policies should be observable and auditable.”

Dialogue 2: risk engine timeout#

Interviewer: “What if the risk engine times out? Do you fail open or fail closed?”
You: “Fail closed for anything that can create unbounded exposure. If we need availability, we introduce a bounded fallback: cached limits with strict TTL, plus a kill switch. The key is that fallback is measurable and explicitly logged so we can explain every decision later.”
Interviewer: “Isn’t that too conservative?”
You: “It’s intentionally conservative. Goldman optimizes for controlled risk, not maximal throughput at any cost.”

These exchanges show a Goldman-appropriate mindset: explicit policies, bounded fallbacks, and traceable decisions.

Back-of-the-envelope assumptions that make your design concrete#

Goldman interviewers don’t require exact numbers, but they do expect you to anchor your design with realistic ranges and show how those ranges drive choices. The mistake junior candidates make is to throw out “millions per second” without understanding the implications. The mistake mid-level candidates make is to avoid numbers entirely.

You can use a small set of assumptions to ground the discussion. For example:

Order entry: from tens to hundreds of orders per second for many desks; potentially much higher for certain electronic flows.
Market data: thousands to hundreds of thousands of updates per second depending on venue coverage and instrument set.
Latency targets: p99 in the low milliseconds for many user-facing paths; tighter budgets for internal execution components depending on context.
Downstream consumers: risk, compliance, reporting, surveillance, PnL, and reconciliations—often many independent consumers of the same event stream.

The important move is not the numbers themselves. It’s how you connect them to architecture. High update rates push you toward streaming pipelines and efficient normalization. Tight latency pushes you toward in-memory structures and fewer synchronous hops. Auditability pushes you toward immutable event logs and deterministic state transitions.

Interview heuristic:
Numbers aren’t there to impress. They’re there to force design choices and expose what you’re trading away.

A real-time trade flow walkthrough (end-to-end)#

Goldman interviews often include trading because it compresses everything: latency, correctness, risk, and audit. The fastest way to demonstrate seniority is to walk through one trade end-to-end and narrate where you spend latency budget and where you refuse to compromise correctness.

Imagine an order enters the system through a FIX (Financial Information eXchange) gateway. The first responsibility is validation: schema validation, entitlement checks (is this user allowed to trade this product?), basic sanity checks (quantity bounds), and deduplication of retransmits. This is a common place where naive candidates do too much synchronously. You keep this stage fast and deterministic: accept or reject quickly, and emit an event capturing the decision.

Next comes pre-trade risk. The system evaluates limits: credit limits, position limits, concentration limits, and potentially real-time risk checks for specific products. This stage is correctness-critical. It must be consistent with the firm’s risk policy and must leave a trail. Strong candidates talk about the risk check as a decision service with explicit timeouts, explicit fallbacks, and explicit logging. Weak candidates say “call the risk service” and move on.

If the order passes risk, it enters the execution path: routing to a venue or internal matching engine depending on the business context. Latency is most sensitive here. If you’re discussing a matching engine, you describe an in-memory order book keyed by symbol with price-time priority. You talk about single-writer per instrument partitions to avoid locks, or careful concurrency control to preserve determinism. You also describe the output: executions produce trade events that are immutable facts, not “updates.”

Once a trade event exists, everything else becomes downstream consumption. Risk recalculations, PnL updates, compliance surveillance, confirmations, and settlement workflows consume the same event spine. This is where streaming platforms fit: they decouple producers from consumers and provide replay for recovery. But you don’t name Kafka (or any tool) as a buzzword; you name it as an “event spine” because replay and ordered consumption are operational requirements in finance.

Finally, audit logging is not a separate afterthought. Audit is the trail of decision points: validation results, risk decision outputs, routing decisions, and execution results. The best way to talk about this is that you capture immutable events and store them with retention and integrity guarantees. The audit system is not the execution path, but it is fed by the execution path.

What great looks like:
You identify the commit points: when an order becomes real, when a risk decision becomes binding, when a trade becomes an immutable fact, and how those facts can be replayed and reconciled.

The system archetypes you’ll see and how to answer them#

Goldman questions often fall into a few archetypes. The trick is to answer them in a Goldman-specific way: emphasize financial correctness, controlled risk, and operational readiness. For each archetype, it helps to contrast what a naive candidate says with how a strong candidate reframes.

Trading and matching systems#

A naive candidate says: “We’ll build a matching engine, store orders in a database, and use a queue for events.” The failure is that storing every order mutation in a database synchronously destroys latency and creates contention. It also doesn’t address determinism: in finance, “which order got filled first” must be defensible.

A strong candidate reframes: execution is an in-memory, deterministic state machine with strict ordering per instrument. Durability and audit are handled through immutable event emission and snapshotting, not by turning the matching loop into a transactional DB workload. You also explicitly define where you can tolerate eventual consistency (downstream analytics) and where you cannot (execution state).

Common pitfall:
Treating the matching engine like a CRUD app. Execution is a state machine with strict ordering, not a set of independent updates.

Market data ingestion pipelines#

A naive candidate says: “We’ll ingest market data and store it in a database for consumers.” The failure is that market data is high-volume, bursty, and often out-of-order across venues. If you attempt to durably store all raw ticks synchronously, you create backpressure and lag at the worst possible time: market volatility.

A strong candidate reframes: you normalize feeds into a canonical schema, attach sequence metadata, deduplicate, and publish into a streaming layer that supports fan-out and replay. You explicitly separate fast path (fresh quotes for trading) from cold path (historical storage for analytics). You also define correctness in terms of freshness and ordering semantics, not “every message stored.”

Risk engines and real-time monitoring#

A naive candidate says: “We’ll compute VaR.” VaR (Value at Risk) is an important concept, but dropping the acronym without explaining the latency and correctness implications doesn’t help. Risk systems often involve a mix of real-time checks (pre-trade limits) and heavier computations (portfolio risk aggregation) with different time tolerances.

A strong candidate reframes: pre-trade risk is a low-latency policy decision with bounded timeouts and clear fallback rules. Post-trade risk monitoring is often streaming and eventually consistent, but it must be accurate and reconcilable. You talk about freshness, caching of reference data, snapshotting, and how to recover when inputs are corrupted.

Payments and money movement#

A naive candidate says: “We process payments with a database transaction.” The failure is that payments are workflows, not single transactions. They involve external systems, partial failures, retries, and strict audit requirements. You need idempotency, reconciliation, and an immutable ledger-like record.

A strong candidate reframes: money movement is a state machine with explicit transitions, idempotency keys, and a durable audit trail. You design for reversals, chargebacks, and delayed confirmations. You also treat security and entitlements as first-class constraints: who can initiate, approve, and reconcile.

Consistency in finance: what must be ACID vs what can be eventually consistent#

You will score points at Goldman by being explicit about consistency. ACID (Atomicity, Consistency, Isolation, Durability) is not a buzzword here; it’s a tool you apply selectively.

Execution state and booked trades are the kinds of records that often demand strong guarantees. If a trade is “booked,” you need to know it is durable and cannot disappear. Payment ledger entries similarly demand strong durability and auditability. In contrast, many analytics views—dashboards, aggregated risk metrics, monitoring panels—can be eventually consistent as long as they converge and you can prove their lineage.

The naive mistake is to declare “we’ll use ACID everywhere,” which usually translates into slow, fragile systems. The opposite naive mistake is to declare “eventual consistency is fine,” which is unacceptable for trade booking and money movement.

This is a Goldman-specific skill: drawing the boundary and defending it.

Interview heuristic:
Treat “eventual consistency” as a contract: you must say what can be stale, for how long, and how you detect and correct drift.

How to drive a latency-first conversation without sounding hand-wavy#

Goldman interviews often steer into latency because low latency is a real differentiator in finance. The trap is that candidates talk about “microseconds” without showing how latency budgets are managed. A latency-first conversation is not about bravado; it’s about clarity.

Start by decomposing latency into budgets: network hops, serialization, validation, risk check call, execution loop, and event publication. Then identify what you can do to reduce tail latency: fewer synchronous dependencies, in-memory hot paths, partitioning to avoid locks, and backpressure to prevent overload.

When you propose something that improves latency, always say what you risk. In-memory state risks loss on crash. You mitigate it with snapshotting and replay from an event log. Asynchronous publication risks delayed visibility downstream. You mitigate by defining what downstream systems can tolerate and by exposing freshness metrics.

That’s what makes the conversation credible: you trade performance for specific risks, then you mitigate those risks explicitly.

Data contracts, schemas, and versioning in market data and trading pipelines#

Senior candidates often overlook one of Goldman’s day-to-day realities: systems evolve. Feeds add fields. Risk models change. Regulatory reporting requirements expand. If you don’t design for schema evolution, your “perfect” architecture will break the first time a feed changes.

In market data, you receive heterogeneous messages from multiple venues. You normalize them into a canonical schema. That schema becomes a contract for consumers: trading, risk, analytics, surveillance. If you change it casually, you break the firm. In interviews, you don’t need to implement a schema registry, but you should describe the principle: versioned schemas, backward compatibility, and explicit migration paths.

In trading pipelines, the same applies to order and trade events. You want immutable events so you can replay and reconcile, but you also need to evolve event formats over time. A strong answer explains how consumers handle multiple versions, how you test schema changes, and how you roll out changes safely.

What great looks like:
You treat schemas as products: versioned, validated, and rolled out with compatibility guarantees, not as incidental JSON blobs.

Security and entitlements: permissions, segregation of duties, and audit trails#

Goldman is a financial institution, not just an engineering organization. That means access control is not optional; it is fundamental. Interviewers often look for whether you naturally include entitlements and audit.

Entitlements are about who can do what: who can trade which instruments, who can view sensitive positions, who can approve a large transfer, who can modify limits. Segregation of duties is about preventing a single actor from initiating and approving risky actions. Audit trails are about proving that controls existed and were applied.

The naive approach is to say “we’ll use authentication and RBAC.” That’s too shallow. A stronger approach explains that entitlements are enforced at decision points: order entry, limit changes, payment approvals, and data access. You also log entitlement decisions as part of the audit trail, because regulators and internal control teams will ask.

Failure modes that separate seniors from juniors#

Goldman interviews often pivot into failure handling because it’s where senior judgment shows up. It’s easy to design a happy path. It’s harder to design something that fails safely under stress.

Consider exchange disconnects. If you lose connectivity to a venue, you must decide what to do with outstanding orders: cancel, reroute, or hold. The correct answer depends on business context, but the senior move is to say that this is policy-driven and observable. You don’t silently “retry forever.” You expose venue health, trigger automated controls, and record decisions.
Consider out-of-order data. Market data can arrive out of sequence across feeds. If your system assumes ordering, you’ll compute wrong prices or trigger wrong risk signals. A senior answer discusses sequence numbers, event-time versus processing-time, and bounded reordering windows. You also describe how you detect corruption: sanity checks, cross-feed validation, and quarantine pipelines.

Consider replay and reconciliation. In finance, replay is not a nice-to-have. If you can’t replay the event stream, you can’t recover cleanly after outages or prove what happened. A senior answer describes how the system can rebuild state from immutable events and how reconciliation compares derived states against authoritative records.
Consider partial outages. If a downstream risk analytics system is down, do you stop trading? Not necessarily. You separate pre-trade controls (must be available, fail closed with bounded fallback) from post-trade analytics (can lag, must recover). You define graceful degradation as “continue with bounded risk,” not “continue at any cost.”

Interview heuristic:
“Graceful degradation” in finance means you reduce activity to preserve safety and auditability, not that you keep everything running no matter what.

Trade-offs you must articulate (and how to talk about them)#

Goldman interviews reward candidates who can articulate trade-offs crisply: what you gain, what you risk, and how you mitigate. You can summarize the most important ones like this:

Latency vs durability	Faster execution path	Loss on crash	Snapshot + replay from immutable log
Strong consistency vs availability	Correct booking and limits	More rejects during outages	Bounded fallback + explicit policies
Streaming vs micro-batching	Lower latency, fresher signals	Higher operational complexity	Backpressure, observability, replay
Synchronous vs async propagation	Faster response times	Downstream lag	Freshness SLAs, consumer catch-up
Memory vs throughput	Faster access	Memory pressure, GC pauses	Partitioning, fixed-size structures, tuning

The table is useful, but what matters is how you narrate it. Two paragraphs of explanation beats ten bullets. When you bring up a trade-off, tie it back to Goldman’s priorities: correctness, auditability, and controlled risk. Then say exactly how you will measure the risk you introduced: freshness metrics, reconciliation jobs, and explicit SLOs (Service Level Objectives).

Operational readiness: runbooks, SLOs, and incident playbooks#

This is one of the easiest ways to stand out at Goldman because many candidates stop at architecture. Senior candidates talk about how the system is operated.

Operational readiness means you can answer: what do you monitor, what are your SLOs, and what is your plan at 3 a.m. when things go wrong? For latency-sensitive systems, you monitor p50/p95/p99 latency by stage (gateway, risk check, execution, publish). For data pipelines, you monitor lag, freshness, and drop rates. For financial correctness, you monitor reconciliation mismatches and idempotency collisions.

Runbooks matter because finance has strong operational controls. If a feed is corrupted, you need a playbook: quarantine the feed, switch to backup, widen controls, notify stakeholders. If risk checks time out, you need a playbook: fail closed, enable bounded fallback, escalate. In interviews, you don’t need to write the runbook; you need to show that you think in runbooks.

What great looks like:
You treat operations as part of the design: explicit SLOs, observable policies, and documented recovery steps.

A small, interview-friendly “how to answer” framework#

You don’t want to narrate a rigid checklist, but you do want a repeatable structure that keeps you calm. After you’ve explained the problem with a few paragraphs, you can use a tiny recap list to keep the conversation crisp.

Before you use bullets in an interview, you should have already explained your reasoning in prose. Then the bullets become a summary, not the content.

State the dominant constraints (latency, correctness, audit, resilience) and pick a scope.
Walk one end-to-end flow and name the commit points.
Stress-test the design with failure modes and define graceful degradation policies.

That’s it. Short, memorable, and Goldman-specific.

Final thoughts#

The Goldman Sachs system design interview is not a test of how many systems you’ve memorized. It’s a test of whether you can design financial infrastructure that behaves predictably under volatility, remains correct under failure, and can be audited after the fact. The strongest answers sound like a senior engineer who has lived through incidents: you talk about policies, limits, idempotency, replay, and operational controls as naturally as you talk about latency.

If you keep your conversation grounded in end-to-end flows, define commit points clearly, and treat audit and resilience as first-class design requirements—not add-ons—you’ll come across as the kind of engineer Goldman trusts with high-stakes systems.

Happy learning!

Written By:

Zarish Khalid

Free Resources

blog

Service Abstractions in Microservices: Patterns and Anti-Patterns

blog

What is Partitioning and Replication in Key-Value Databases?

blog

What is the role of trade-offs in System Design interview answers