An ATM network looks simple from the user’s perspective: insert card, enter PIN, withdraw cash. But from a System Design interview perspective, it’s one of the cleanest tests of “real-world correctness.” You’re building a distributed system that touches a core ledger and physical hardware, and you don’t get to hand-wave failure cases away. If the system dispenses cash, the account must be debited. If cash is not dispensed, the account must not be debited (or must be reversed). Everything else is implementation detail.
This is also why ATM design is different from most web systems. In many consumer applications, eventual consistency is acceptable and failures are mostly “retry later.” In an ATM, the failure modes are messier: the network can drop after authorization, the cash dispenser can jam, the ATM can reboot mid-transaction, and foreign cards add multiple external dependencies. A strong answer shows you understand how to design for safe outcomes even when the world is unreliable.
In this blog, we’ll build an interview-ready ATM architecture, explain how transactions stay ACID-correct, introduce a withdrawal state machine (with reversal and reconciliation), cover interbank routing and settlement, and show how to think about fraud prevention and observability.
System Design Interviews decide your level and compensation at top tech companies. To succeed, you must design scalable systems, justify trade-offs, and explain decisions under time pressure. Most candidates struggle because they lack a repeatable method. Built by FAANG engineers, this is the definitive System Design Interview course. You will master distributed systems building blocks: databases, caches, load balancers, messaging, microservices, sharding, replication, and consistency, and learn the patterns behind web-scale architectures. Using the RESHADED framework, you will translate open-ended system design problems into precise requirements, explicit constraints, and success metrics, then design modular, reliable solutions. Full Mock Interview practice builds fluency and timing. By the end, you will discuss architectures with Staff-level clarity, tackle unseen questions with confidence, and stand out in System Design Interviews at leading companies.
Interview signal: ATM design is not about microservices diagrams. It’s about transaction correctness under hardware and network ambiguity.
Clarify requirements the way interviewers expect#
In ATM interviews, you want to start by clarifying scope and then locking the guarantees. The scope is usually a “bank-operated ATM network” with a core banking system behind it, but you should explicitly confirm which features are in scope: withdrawals, balance inquiry, deposits, receipts, and foreign card support. You can mention deposits briefly (they often settle asynchronously), but the main focus should be withdrawals and balance inquiries because they stress correctness.
Once scope is clear, define the invariants. For withdrawals, the invariant is: a customer should not lose money without receiving cash, and the bank should not dispense cash without debiting the account. The hard part is that the ATM is a physical device and your backend doesn’t directly observe “cash delivered.” It receives a confirmation signal that can be delayed, lost, or incorrect. That’s why the withdrawal workflow must be modeled as a state machine with reconciliation.
Also, be explicit about non-functional constraints: PCI compliance, encryption, HSM usage for PIN verification, strict auditability, and extremely conservative failure handling. A good answer doesn’t claim “five nines everywhere.” Instead, it identifies which parts must be strongly consistent (ledger writes) and which can degrade gracefully (receipt printing, UI hints).
Category | Requirement | Notes (what interviewers listen for) |
Functional | Authenticate user (card + PIN) | PIN verification via HSM, never plaintext |
Functional | Balance inquiry | Read path can be optimized, but must be correct enough |
Functional | Cash withdrawal | Two-phase workflow with hold/commit and dispense confirmation |
Functional | Deposits | Often asynchronous credit with later verification |
Functional | Foreign card support | Requires network routing + auth vs settlement separation |
Non-functional | ACID correctness | Ledger is the source of truth |
Non-functional | Security + compliance | PCI DSS, encryption, tamper resistance |
Non-functional | Auditability | Immutable logs + dispute workflows |
Non-functional | Fault tolerance | Safe handling of network and hardware failures |
Common pitfall: Treating “withdrawal” as a single database update. In reality, the cash dispenser forces a multi-step transaction with uncertain outcomes.
Summary (after the explanation):
Confirm scope (withdrawal + balance inquiry are the core).
State invariants (no cash without debit, no debit without cash).
Call out compliance and auditability as first-class requirements.
Treat hardware and network ambiguity as expected, not rare.
High-level architecture: what components exist and why#
An ATM system is a layered network: edge devices (ATMs) connect to a bank switch, which routes requests to authorization and transaction processing, which then interacts with the core ledger. The architecture is intentionally conservative: the ledger is centralized or single-writer to preserve correctness, while the edge can be distributed to scale.
At the edge, the ATM is effectively a specialized client with hardware peripherals: card reader, PIN pad, receipt printer, cash dispenser, and sensors. It must encrypt sensitive inputs and operate safely under timeouts. The ATM does not “decide” financial truth; it requests authorization and follows backend instructions.
In the middle, you typically have an ATM switch (or network gateway) that terminates device connections, applies routing and basic policy, and forwards to internal services. For foreign cards, the switch routes to external card networks. Behind the switch sits the Transaction Processing System (TPS) which owns the withdrawal workflow state machine and coordinates with the core ledger.
ATM device | UI + card/PIN capture + dispense + sensors | Hardware boundary; must fail safely |
Secure comms module | Encrypt PIN block + session keys | Protect secrets over hostile networks |
ATM switch / gateway | Routing, throttling, protocol translation | Central control point for edge traffic |
Authentication service / HSM | PIN verification, key management | Compliance + cryptographic trust anchor |
Transaction Processing System (TPS) | Orchestrate withdrawals, idempotency, state machine | Prevent double-debits and handle ambiguity |
Core ledger / banking DB | Authoritative balances and postings | ACID correctness and auditability |
Reconciliation service | Resolve uncertain outcomes, post reversals | Hardware/network failures are normal |
Fraud/risk engine | Score and block suspicious withdrawals | Prevent losses and abuse |
Audit log pipeline | Immutable event recording | Disputes, compliance, forensics |
Interview signal: Strong candidates separate the “routing plane” (switch) from the “money plane” (ledger + TPS), and they never let the ATM directly mutate balances.
Summary (after the explanation):
Keep the core ledger strongly consistent and authoritative.
Put orchestration and idempotency in TPS, not in the ATM.
Use an ATM switch for routing and external network integration.
Treat reconciliation and audit logging as core services.
Data model: ledger-first thinking (not “balance as a field”)#
The safest way to think about bank accounts is: the ledger is the truth, and balance is a derived value (or a carefully maintained cached value) from ledger postings. In interviews, you don’t need to design a full accounting system, but you should demonstrate ledger-first reasoning: every withdrawal produces entries that are auditable and immutable.
For ATM withdrawals, it’s also common to model “holds” (authorizations) separately from “posted” transactions. A hold reduces available balance immediately, preventing double-withdrawal, while the final posting occurs after dispense confirmation. This split is essential when hardware is involved.
You also need idempotency keys and transaction identifiers. The ATM generates a transaction_id that is stable across retries. The TPS uses it to ensure a retried request returns the same result rather than creating a second withdrawal.
Account | account_id, status, available_balance_cache | Account metadata + fast reads |
Ledger entry | entry_id, account_id, amount, type, timestamp | Immutable postings for audit |
Authorization hold | hold_id, account_id, amount, state, expires_at | Reserve funds before dispense |
ATM transaction | txn_id, atm_id, account_id, amount, state | State machine + idempotency anchor |
Dispenser event | txn_id, sensor_status, cash_presented | Hardware evidence for reconciliation |
Common pitfall: Only storing “balance” and updating it in place. Interviewers want to hear “ledger entries + holds + immutable audit trail.”
Summary (after the explanation):
Use ledger entries for auditability and correctness.
Use holds to reserve funds before dispensing cash.
Use stable transaction IDs for idempotency across retries.
Walkthrough 1: balance inquiry (step-by-step)#
A balance inquiry looks simple, but it’s a great place to demonstrate safe read design. The ATM needs a quick response, but the bank must not show a balance that’s wildly wrong. In most systems, balance inquiry reads from a strongly consistent source (or a read replica with bounded staleness if the bank accepts that trade-off).
The flow starts when the ATM reads the card and prompts for PIN. The PIN is encrypted on the device and sent to the bank’s authentication service (often backed by an HSM). After successful authentication, the ATM sends a balance inquiry request with the authenticated session token and account identifier.
The TPS verifies authorization, checks account status (active, not blocked), and fetches the balance. This may come from a cached available balance maintained by ledger postings, or computed from ledger entries. The TPS returns the balance plus metadata like “available” vs “current,” which matters when holds exist.
1 | ATM | Capture card + PIN, encrypt PIN block | Secure auth request prepared |
2 | Switch | Route request to auth service | Correct internal routing |
3 | Auth/HSM | Verify PIN | Authenticated session or denial |
4 | TPS | Validate session + account status | Authorization enforced |
5 | Ledger | Read available/current balance | Correct balance computed |
6 | ATM | Display/print balance | User receives result |
What to say in the interview: “Balance inquiry is read-heavy, but I still treat it as sensitive. I validate session, account status, and return available vs current balance to reflect holds.”
Summary (after the explanation):
Authenticate first, then read balance.
Return available/current balance to account for holds.
Prefer correctness over aggressive caching for financial reads.
Walkthrough 2: cash withdrawal (step-by-step)#
Cash withdrawal is the heart of the ATM question because it combines a ledger update with a physical action. The key design insight is that you cannot commit the final debit until you have strong evidence that cash was actually dispensed. But you also can’t dispense cash without reserving funds first, or you risk overdrawing the account.
The correct mental model is: authorize and hold funds → attempt dispense → confirm dispense → commit posting. If anything fails after funds are held but before dispense confirmation, you reverse the hold. This avoids “customer lost money without cash,” which is the worst-case outcome.
The withdrawal flow begins similarly with authentication. The user requests an amount, and the TPS checks limits (daily withdrawal limit, ATM limits, available balance). If approved, the TPS creates a hold and returns a dispense authorization to the ATM. The ATM attempts to dispense cash and uses sensors to confirm whether cash was presented and taken. Only then does the ATM send a dispense confirmation back to the TPS, which commits the final debit and clears the hold.
1 | ATM | Auth + withdrawal request | Start transaction |
2 | TPS | Check limits + available balance | Prevent overdraft |
3 | TPS/Ledger | Create hold (reserve funds) | Funds held, not posted |
4 | ATM | Dispense attempt | Hardware action |
5 | ATM sensors | Confirm cash presented/taken | Evidence for commit |
6 | TPS/Ledger | Post final debit + release hold | Money moves permanently |
7 | ATM | Print receipt | Non-critical side effect |
Interview signal: The phrase “commit only after dispense confirmation” is a strong indicator you understand hardware-driven transaction design.
Summary (after the explanation):
Reserve funds first using a hold.
Dispense cash, then confirm via sensors.
Commit the debit only after confirmation.
Reverse holds when outcomes are uncertain.
Withdrawal state machine and reconciliation logic#
The withdrawal state machine is where you demonstrate senior-level rigor. The system must behave correctly under retries, timeouts, partial failures, and ATM crashes. A state machine makes these cases explicit and auditable.
A good state machine begins after authentication. Once the user requests a withdrawal, the TPS creates a transaction record with a stable txn_id. It then transitions through “funds held,” “dispense in progress,” “dispense confirmed,” and “committed.” The reversal path exists for any failure after funds are held but before commit. This reversal is not a best-effort cleanup; it is a first-class transaction outcome.
The most important point is why “cash dispense confirmation” changes commit behavior. If you commit the debit before dispense confirmation, you risk charging the customer even if the dispenser jams or the ATM loses power. If you dispense before holding funds, you risk giving cash without debit. The state machine enforces the safe ordering.
AUTHENTICATED | User verified | session_id, account_id | → WITHDRAWAL_REQUESTED |
WITHDRAWAL_REQUESTED | Amount chosen | txn_id, amount | → FUNDS_HELD, DECLINED |
FUNDS_HELD | Funds reserved | hold_id, hold_amount | → DISPENSE_REQUESTED, REVERSED |
DISPENSE_REQUESTED | ATM instructed to dispense | atm_command_id | → DISPENSING, REVERSED |
DISPENSING | Hardware in progress | dispenser_status | → DISPENSE_CONFIRMED, DISPENSE_FAILED |
DISPENSE_CONFIRMED | Cash confirmed delivered | sensor_evidence | → COMMITTED |
COMMITTED | Final debit posted | ledger_entry_id | terminal |
REVERSED | Hold released / debit reversed | reversal_entry_id | terminal |
Common pitfall: Treating “timeout” as failure and immediately reversing, without considering that cash might have been dispensed. Strong designs reconcile using device logs and sensor evidence.
Reconciliation is the safety net for ambiguous outcomes. If the TPS doesn’t receive dispense confirmation, it marks the transaction as “needs reconciliation” rather than guessing. A reconciliation job periodically checks ATM device journals, dispenser counters, and any late confirmations. If it determines cash was dispensed, it commits. If it determines cash was not dispensed, it reverses. If it can’t determine, it escalates to manual dispute workflows.
Summary (after the explanation):
Model withdrawal as a persisted state machine with explicit transitions.
Hold funds before dispense; commit only after dispense confirmation.
Treat ambiguous outcomes as “reconcile,” not “guess.”
Reversal is a first-class outcome, not an afterthought.
Interbank routing and settlement#
Foreign card processing adds a second distributed system: the card network and the issuing bank. When a user uses Bank B’s card at Bank A’s ATM, Bank A is the acquirer, Bank B is the issuer, and the network (Visa/Mastercard/etc.) routes authorization messages between them. This introduces latency, dependencies, and different phases of money movement.
In these systems, it’s critical to separate authorization from settlement. Authorization is a real-time decision: does the issuer approve the withdrawal and place a hold? Settlement is the later process of actually moving funds between banks and reconciling fees. Your ATM system must handle authorization synchronously (because the user is waiting) while settlement happens asynchronously in batch cycles.
Network dependencies also change failure handling. If the card network is down, you may need to decline foreign transactions, route to a backup network, or degrade with conservative limits. The key is to avoid dispensing cash when you cannot obtain a valid authorization response from the issuer.
Authorization | Approve/decline + hold funds | ATM bank ↔ network ↔ issuer bank | Seconds |
Dispense + confirm | Cash delivery confirmation | ATM ↔ ATM bank TPS | Seconds |
Posting | Final debit/clearing hold | Issuer ledger | Seconds–minutes |
Settlement | Interbank fund transfer + fees | Banks + network settlement | Hours–days |
Interview signal: Saying “authorization is synchronous, settlement is asynchronous” shows you understand real payment rails and why the core ledger can’t be your only dependency.
Summary (after the explanation):
Route foreign transactions through networks to the issuer.
Separate authorization (real-time) from settlement (batch).
Decline safely if you can’t get issuer authorization.
Fraud and abuse prevention#
Fraud prevention in ATM networks is both digital and physical. On the digital side, attackers try to brute-force PINs, exploit stolen cards, or automate withdrawals. On the physical side, skimmers, compromised ATMs, and cash-out attacks are real. A strong interview answer treats fraud controls as layered: edge controls, risk scoring, and circuit breakers.
Start with velocity controls: per-card withdrawal frequency, per-account daily limits, and per-ATM anomaly thresholds. Add geo and behavior signals: withdrawals in two distant locations within minutes, repeated declines, unusual withdrawal patterns, or abnormal ATM error rates. The risk engine can return “allow,” “deny,” or “step-up” (for example, require additional verification at branch, though ATMs are limited).
You also need circuit breakers. If an ATM is suspected of tampering or a region is under attack, you can disable withdrawals, reduce limits, or require online authorization only (no offline fallback). These controls protect the bank even if they temporarily degrade customer experience.
Velocity checks | Too many withdrawals in short window | Decline or reduce limits |
Geo anomaly | Impossible travel pattern | Block + alert |
PIN brute force | Multiple incorrect PIN attempts | Card capture / temporary lock |
Skimming signals | ATM hardware tamper alerts | Disable ATM, dispatch service |
Risk scoring | Suspicious pattern across accounts | Require manual review or deny |
Circuit breaker | Network/issuer instability | Conservative declines to avoid loss |
What to say in the interview: “Fraud controls are layered. I enforce limits early, score risk centrally, and use circuit breakers to protect the system during attacks.”
Summary (after the explanation):
Use velocity and anomaly detection to prevent cash-out attacks.
Add tamper signals and ATM health monitoring for skimming.
Apply circuit breakers for suspected compromise or instability.
Consistency vs availability: realistic trade-offs#
ATM systems strongly prefer consistency for ledger writes, but you still need to discuss availability realistically. A common architecture is a single-writer core ledger (or a strongly consistent cluster) that processes all withdrawals for an account. This avoids split-brain balances and simplifies ACID guarantees. The trade-off is that multi-region active-active writes are difficult without heavy coordination.
For global banks, the common compromise is: route transactions to the “home region” for the account or use a partitioned ledger where each account has a single authoritative shard. Reads can be served from replicas, but writes must go to the authoritative shard. If a region is down, you may deny withdrawals rather than risk inconsistency. That’s a business decision, and in interviews, you should state it clearly.
You can also mention limited offline modes (allowing small withdrawals without online authorization), but those are risky and heavily constrained. Most interview settings assume online authorization is required.
Single-writer ledger | Strong | Lower during regional outages | Common for correctness |
Multi-region active-active | Hard to guarantee | Higher | Rare, complex |
Read replicas for balance | Mostly strong (bounded staleness) | Higher | Common optimization |
Offline withdrawals | Weak | Higher | Rare, tightly limited |
Common pitfall: Saying “active-active multi-region with strong consistency” without explaining coordination and failure modes. Interviewers will push on split brain and double-dispense risk.
Summary (after the explanation):
Prefer single-writer or strongly consistent ledger for withdrawals.
Use replicas for reads, but keep writes authoritative.
Be explicit about the business choice: deny vs risk inconsistency.
Reliability, observability, and auditability#
ATM systems must be observable and auditable because disputes are inevitable. Customers will claim “I didn’t get the cash,” or “I was charged twice,” and regulators will demand traceability. That means every step must produce immutable logs: requests, responses, holds, dispense commands, sensor confirmations, reversals, and reconciliation outcomes.
Observability also protects operations. You monitor ATM health (cash levels, dispenser errors), backend latency, authorization failure rates, fraud blocks, and reconciliation queue sizes. For the ledger and TPS, you monitor lock contention, transaction timeouts, and idempotency hit rates (retries). For the network, you monitor routing failures and third-party network availability.
A strong answer includes SLOs that reflect user experience and safety: p95 authorization latency, dispense-confirm latency, reconciliation completion time, and fraud false positive rate. You also want “audit completeness” metrics: every committed debit should have a corresponding dispense confirmation or a reconciliation decision.
p95 auth latency | Customer wait time | < 2–3 seconds |
Withdrawal success rate | Core availability | > 99.9% (excluding fraud declines) |
Reconciliation backlog | Risk of unresolved disputes | Near zero steady-state |
Idempotency replay rate | Network instability indicator | Track and alert on spikes |
Dispense failure rate | Hardware reliability | Alert per ATM model/region |
Fraud block rate + FP rate | Safety vs customer friction | Monitor drift over time |
Interview signal: “Immutable logs + reconciliation jobs + dispute workflows” is what turns your design into a bank-grade system rather than a demo.
Summary (after the explanation):
Log every step immutably for audit and disputes.
Monitor both software and hardware health.
Track reconciliation backlog and idempotency retries as key safety signals.
How interviewers evaluate your ATM design#
Interviewers are looking for more than a component diagram. They want to see that you prioritize correctness under ambiguity and that you understand why hardware changes transaction semantics. Your strongest signals are: using holds, committing only after dispense confirmation, modeling a state machine, and having reconciliation logic for uncertain outcomes.
They also expect you to take security seriously without drifting into vague statements. Mention HSMs for PIN verification, encryption in transit, PCI compliance constraints, and strict access controls. For foreign cards, they expect you to describe authorization vs settlement and external network dependencies.
Finally, they want trade-offs. If you claim everything is strongly consistent and highly available across regions, they’ll ask how you avoid double withdrawals during partitions. A strong answer chooses safety and explains why.
What strong answers sound like: “I treat withdrawal as a state machine with holds and commit-after-dispense. I design for at-least-once messaging with idempotency, and I reconcile ambiguous outcomes using device journals and immutable logs.”
Summary (after the explanation):
Lead with correctness: holds, commit-after-dispense, reconciliation.
Show security depth: HSM, encryption, compliance.
Explain interbank routing and settlement phases.
Make realistic consistency/availability trade-offs.
Final takeaway#
ATM System Design forces you to design a distributed system where mistakes are expensive and visible. The right approach is conservative: strong ledger correctness, idempotent transaction handling, explicit state machines, and safe failure recovery when hardware is involved. If you can explain why dispense confirmation gates commit, how reversals and reconciliation work, and how you audit every step for disputes, you’ll give a Staff-level answer.
Happy learning!
Free Resources