ATM System Design

ATM System Design

In this blog, learn how to ace the ATM System Design interview by designing for ACID correctness, safe cash withdrawal workflows, and robust failure handling with reconciliation, interbank routing, and fraud prevention.

13 mins read
Jan 29, 2026
Share
editor-page-cover

An ATM network looks simple from the user’s perspective: insert card, enter PIN, withdraw cash. But from a System Design interview perspective, it’s one of the cleanest tests of “real-world correctness.” You’re building a distributed system that touches a core ledger and physical hardware, and you don’t get to hand-wave failure cases away. If the system dispenses cash, the account must be debited. If cash is not dispensed, the account must not be debited (or must be reversed). Everything else is implementation detail.

This is also why ATM design is different from most web systems. In many consumer applications, eventual consistency is acceptable and failures are mostly “retry later.” In an ATM, the failure modes are messier: the network can drop after authorization, the cash dispenser can jam, the ATM can reboot mid-transaction, and foreign cards add multiple external dependencies. A strong answer shows you understand how to design for safe outcomes even when the world is unreliable.

In this blog, we’ll build an interview-ready ATM architecture, explain how transactions stay ACID-correct, introduce a withdrawal state machine (with reversal and reconciliation), cover interbank routing and settlement, and show how to think about fraud prevention and observability.

Cover
Grokking Modern System Design Interview

System Design Interviews decide your level and compensation at top tech companies. To succeed, you must design scalable systems, justify trade-offs, and explain decisions under time pressure. Most candidates struggle because they lack a repeatable method. Built by FAANG engineers, this is the definitive System Design Interview course. You will master distributed systems building blocks: databases, caches, load balancers, messaging, microservices, sharding, replication, and consistency, and learn the patterns behind web-scale architectures. Using the RESHADED framework, you will translate open-ended system design problems into precise requirements, explicit constraints, and success metrics, then design modular, reliable solutions. Full Mock Interview practice builds fluency and timing. By the end, you will discuss architectures with Staff-level clarity, tackle unseen questions with confidence, and stand out in System Design Interviews at leading companies.

26hrs
Intermediate
5 Playgrounds
26 Quizzes

Interview signal: ATM design is not about microservices diagrams. It’s about transaction correctness under hardware and network ambiguity.

Clarify requirements the way interviewers expect#

In ATM interviews, you want to start by clarifying scope and then locking the guarantees. The scope is usually a “bank-operated ATM network” with a core banking system behind it, but you should explicitly confirm which features are in scope: withdrawals, balance inquiry, deposits, receipts, and foreign card support. You can mention deposits briefly (they often settle asynchronously), but the main focus should be withdrawals and balance inquiries because they stress correctness.

widget

Once scope is clear, define the invariants. For withdrawals, the invariant is: a customer should not lose money without receiving cash, and the bank should not dispense cash without debiting the account. The hard part is that the ATM is a physical device and your backend doesn’t directly observe “cash delivered.” It receives a confirmation signal that can be delayed, lost, or incorrect. That’s why the withdrawal workflow must be modeled as a state machine with reconciliation.

Also, be explicit about non-functional constraints: PCI compliance, encryption, HSM usage for PIN verification, strict auditability, and extremely conservative failure handling. A good answer doesn’t claim “five nines everywhere.” Instead, it identifies which parts must be strongly consistent (ledger writes) and which can degrade gracefully (receipt printing, UI hints).

Category

Requirement

Notes (what interviewers listen for)

Functional

Authenticate user (card + PIN)

PIN verification via HSM, never plaintext

Functional

Balance inquiry

Read path can be optimized, but must be correct enough

Functional

Cash withdrawal

Two-phase workflow with hold/commit and dispense confirmation

Functional

Deposits

Often asynchronous credit with later verification

Functional

Foreign card support

Requires network routing + auth vs settlement separation

Non-functional

ACID correctness

Ledger is the source of truth

Non-functional

Security + compliance

PCI DSS, encryption, tamper resistance

Non-functional

Auditability

Immutable logs + dispute workflows

Non-functional

Fault tolerance

Safe handling of network and hardware failures

Common pitfall: Treating “withdrawal” as a single database update. In reality, the cash dispenser forces a multi-step transaction with uncertain outcomes.

Summary (after the explanation):

  • Confirm scope (withdrawal + balance inquiry are the core).

  • State invariants (no cash without debit, no debit without cash).

  • Call out compliance and auditability as first-class requirements.

  • Treat hardware and network ambiguity as expected, not rare.

High-level architecture: what components exist and why#

An ATM system is a layered network: edge devices (ATMs) connect to a bank switch, which routes requests to authorization and transaction processing, which then interacts with the core ledger. The architecture is intentionally conservative: the ledger is centralized or single-writer to preserve correctness, while the edge can be distributed to scale.

At the edge, the ATM is effectively a specialized client with hardware peripherals: card reader, PIN pad, receipt printer, cash dispenser, and sensors. It must encrypt sensitive inputs and operate safely under timeouts. The ATM does not “decide” financial truth; it requests authorization and follows backend instructions.

In the middle, you typically have an ATM switch (or network gateway) that terminates device connections, applies routing and basic policy, and forwards to internal services. For foreign cards, the switch routes to external card networks. Behind the switch sits the Transaction Processing System (TPS) which owns the withdrawal workflow state machine and coordinates with the core ledger.

ATM device

UI + card/PIN capture + dispense + sensors

Hardware boundary; must fail safely

Secure comms module

Encrypt PIN block + session keys

Protect secrets over hostile networks

ATM switch / gateway

Routing, throttling, protocol translation

Central control point for edge traffic

Authentication service / HSM

PIN verification, key management

Compliance + cryptographic trust anchor

Transaction Processing System (TPS)

Orchestrate withdrawals, idempotency, state machine

Prevent double-debits and handle ambiguity

Core ledger / banking DB

Authoritative balances and postings

ACID correctness and auditability

Reconciliation service

Resolve uncertain outcomes, post reversals

Hardware/network failures are normal

Fraud/risk engine

Score and block suspicious withdrawals

Prevent losses and abuse

Audit log pipeline

Immutable event recording

Disputes, compliance, forensics

Interview signal: Strong candidates separate the “routing plane” (switch) from the “money plane” (ledger + TPS), and they never let the ATM directly mutate balances.

Summary (after the explanation):

  • Keep the core ledger strongly consistent and authoritative.

  • Put orchestration and idempotency in TPS, not in the ATM.

  • Use an ATM switch for routing and external network integration.

  • Treat reconciliation and audit logging as core services.

Data model: ledger-first thinking (not “balance as a field”)#

The safest way to think about bank accounts is: the ledger is the truth, and balance is a derived value (or a carefully maintained cached value) from ledger postings. In interviews, you don’t need to design a full accounting system, but you should demonstrate ledger-first reasoning: every withdrawal produces entries that are auditable and immutable.

widget

For ATM withdrawals, it’s also common to model “holds” (authorizations) separately from “posted” transactions. A hold reduces available balance immediately, preventing double-withdrawal, while the final posting occurs after dispense confirmation. This split is essential when hardware is involved.

You also need idempotency keys and transaction identifiers. The ATM generates a transaction_id that is stable across retries. The TPS uses it to ensure a retried request returns the same result rather than creating a second withdrawal.

Account

account_id, status, available_balance_cache

Account metadata + fast reads

Ledger entry

entry_id, account_id, amount, type, timestamp

Immutable postings for audit

Authorization hold

hold_id, account_id, amount, state, expires_at

Reserve funds before dispense

ATM transaction

txn_id, atm_id, account_id, amount, state

State machine + idempotency anchor

Dispenser event

txn_id, sensor_status, cash_presented

Hardware evidence for reconciliation

Common pitfall: Only storing “balance” and updating it in place. Interviewers want to hear “ledger entries + holds + immutable audit trail.”

Summary (after the explanation):

  • Use ledger entries for auditability and correctness.

  • Use holds to reserve funds before dispensing cash.

  • Use stable transaction IDs for idempotency across retries.

Walkthrough 1: balance inquiry (step-by-step)#

A balance inquiry looks simple, but it’s a great place to demonstrate safe read design. The ATM needs a quick response, but the bank must not show a balance that’s wildly wrong. In most systems, balance inquiry reads from a strongly consistent source (or a read replica with bounded staleness if the bank accepts that trade-off).

The flow starts when the ATM reads the card and prompts for PIN. The PIN is encrypted on the device and sent to the bank’s authentication service (often backed by an HSM). After successful authentication, the ATM sends a balance inquiry request with the authenticated session token and account identifier.

The TPS verifies authorization, checks account status (active, not blocked), and fetches the balance. This may come from a cached available balance maintained by ledger postings, or computed from ledger entries. The TPS returns the balance plus metadata like “available” vs “current,” which matters when holds exist.

1

ATM

Capture card + PIN, encrypt PIN block

Secure auth request prepared

2

Switch

Route request to auth service

Correct internal routing

3

Auth/HSM

Verify PIN

Authenticated session or denial

4

TPS

Validate session + account status

Authorization enforced

5

Ledger

Read available/current balance

Correct balance computed

6

ATM

Display/print balance

User receives result

What to say in the interview: “Balance inquiry is read-heavy, but I still treat it as sensitive. I validate session, account status, and return available vs current balance to reflect holds.”

Summary (after the explanation):

  • Authenticate first, then read balance.

  • Return available/current balance to account for holds.

  • Prefer correctness over aggressive caching for financial reads.

Walkthrough 2: cash withdrawal (step-by-step)#

Cash withdrawal is the heart of the ATM question because it combines a ledger update with a physical action. The key design insight is that you cannot commit the final debit until you have strong evidence that cash was actually dispensed. But you also can’t dispense cash without reserving funds first, or you risk overdrawing the account.

The correct mental model is: authorize and hold funds → attempt dispense → confirm dispense → commit posting. If anything fails after funds are held but before dispense confirmation, you reverse the hold. This avoids “customer lost money without cash,” which is the worst-case outcome.

The withdrawal flow begins similarly with authentication. The user requests an amount, and the TPS checks limits (daily withdrawal limit, ATM limits, available balance). If approved, the TPS creates a hold and returns a dispense authorization to the ATM. The ATM attempts to dispense cash and uses sensors to confirm whether cash was presented and taken. Only then does the ATM send a dispense confirmation back to the TPS, which commits the final debit and clears the hold.

1

ATM

Auth + withdrawal request

Start transaction

2

TPS

Check limits + available balance

Prevent overdraft

3

TPS/Ledger

Create hold (reserve funds)

Funds held, not posted

4

ATM

Dispense attempt

Hardware action

5

ATM sensors

Confirm cash presented/taken

Evidence for commit

6

TPS/Ledger

Post final debit + release hold

Money moves permanently

7

ATM

Print receipt

Non-critical side effect

Interview signal: The phrase “commit only after dispense confirmation” is a strong indicator you understand hardware-driven transaction design.

Summary (after the explanation):

  • Reserve funds first using a hold.

  • Dispense cash, then confirm via sensors.

  • Commit the debit only after confirmation.

  • Reverse holds when outcomes are uncertain.

Withdrawal state machine and reconciliation logic#

The withdrawal state machine is where you demonstrate senior-level rigor. The system must behave correctly under retries, timeouts, partial failures, and ATM crashes. A state machine makes these cases explicit and auditable.

widget

A good state machine begins after authentication. Once the user requests a withdrawal, the TPS creates a transaction record with a stable txn_id. It then transitions through “funds held,” “dispense in progress,” “dispense confirmed,” and “committed.” The reversal path exists for any failure after funds are held but before commit. This reversal is not a best-effort cleanup; it is a first-class transaction outcome.

The most important point is why “cash dispense confirmation” changes commit behavior. If you commit the debit before dispense confirmation, you risk charging the customer even if the dispenser jams or the ATM loses power. If you dispense before holding funds, you risk giving cash without debit. The state machine enforces the safe ordering.

AUTHENTICATED

User verified

session_id, account_id

→ WITHDRAWAL_REQUESTED

WITHDRAWAL_REQUESTED

Amount chosen

txn_id, amount

→ FUNDS_HELD, DECLINED

FUNDS_HELD

Funds reserved

hold_id, hold_amount

→ DISPENSE_REQUESTED, REVERSED

DISPENSE_REQUESTED

ATM instructed to dispense

atm_command_id

→ DISPENSING, REVERSED

DISPENSING

Hardware in progress

dispenser_status

→ DISPENSE_CONFIRMED, DISPENSE_FAILED

DISPENSE_CONFIRMED

Cash confirmed delivered

sensor_evidence

→ COMMITTED

COMMITTED

Final debit posted

ledger_entry_id

terminal

REVERSED

Hold released / debit reversed

reversal_entry_id

terminal

Common pitfall: Treating “timeout” as failure and immediately reversing, without considering that cash might have been dispensed. Strong designs reconcile using device logs and sensor evidence.

Reconciliation is the safety net for ambiguous outcomes. If the TPS doesn’t receive dispense confirmation, it marks the transaction as “needs reconciliation” rather than guessing. A reconciliation job periodically checks ATM device journals, dispenser counters, and any late confirmations. If it determines cash was dispensed, it commits. If it determines cash was not dispensed, it reverses. If it can’t determine, it escalates to manual dispute workflows.

Summary (after the explanation):

  • Model withdrawal as a persisted state machine with explicit transitions.

  • Hold funds before dispense; commit only after dispense confirmation.

  • Treat ambiguous outcomes as “reconcile,” not “guess.”

  • Reversal is a first-class outcome, not an afterthought.

Interbank routing and settlement#

Foreign card processing adds a second distributed system: the card network and the issuing bank. When a user uses Bank B’s card at Bank A’s ATM, Bank A is the acquirer, Bank B is the issuer, and the network (Visa/Mastercard/etc.) routes authorization messages between them. This introduces latency, dependencies, and different phases of money movement.

In these systems, it’s critical to separate authorization from settlement. Authorization is a real-time decision: does the issuer approve the withdrawal and place a hold? Settlement is the later process of actually moving funds between banks and reconciling fees. Your ATM system must handle authorization synchronously (because the user is waiting) while settlement happens asynchronously in batch cycles.

Network dependencies also change failure handling. If the card network is down, you may need to decline foreign transactions, route to a backup network, or degrade with conservative limits. The key is to avoid dispensing cash when you cannot obtain a valid authorization response from the issuer.

Authorization

Approve/decline + hold funds

ATM bank ↔ network ↔ issuer bank

Seconds

Dispense + confirm

Cash delivery confirmation

ATM ↔ ATM bank TPS

Seconds

Posting

Final debit/clearing hold

Issuer ledger

Seconds–minutes

Settlement

Interbank fund transfer + fees

Banks + network settlement

Hours–days

Interview signal: Saying “authorization is synchronous, settlement is asynchronous” shows you understand real payment rails and why the core ledger can’t be your only dependency.

Summary (after the explanation):

  • Route foreign transactions through networks to the issuer.

  • Separate authorization (real-time) from settlement (batch).

  • Decline safely if you can’t get issuer authorization.

Fraud and abuse prevention#

Fraud prevention in ATM networks is both digital and physical. On the digital side, attackers try to brute-force PINs, exploit stolen cards, or automate withdrawals. On the physical side, skimmers, compromised ATMs, and cash-out attacks are real. A strong interview answer treats fraud controls as layered: edge controls, risk scoring, and circuit breakers.

Start with velocity controls: per-card withdrawal frequency, per-account daily limits, and per-ATM anomaly thresholds. Add geo and behavior signals: withdrawals in two distant locations within minutes, repeated declines, unusual withdrawal patterns, or abnormal ATM error rates. The risk engine can return “allow,” “deny,” or “step-up” (for example, require additional verification at branch, though ATMs are limited).

You also need circuit breakers. If an ATM is suspected of tampering or a region is under attack, you can disable withdrawals, reduce limits, or require online authorization only (no offline fallback). These controls protect the bank even if they temporarily degrade customer experience.

Velocity checks

Too many withdrawals in short window

Decline or reduce limits

Geo anomaly

Impossible travel pattern

Block + alert

PIN brute force

Multiple incorrect PIN attempts

Card capture / temporary lock

Skimming signals

ATM hardware tamper alerts

Disable ATM, dispatch service

Risk scoring

Suspicious pattern across accounts

Require manual review or deny

Circuit breaker

Network/issuer instability

Conservative declines to avoid loss

What to say in the interview: “Fraud controls are layered. I enforce limits early, score risk centrally, and use circuit breakers to protect the system during attacks.”

Summary (after the explanation):

  • Use velocity and anomaly detection to prevent cash-out attacks.

  • Add tamper signals and ATM health monitoring for skimming.

  • Apply circuit breakers for suspected compromise or instability.

Consistency vs availability: realistic trade-offs#

ATM systems strongly prefer consistency for ledger writes, but you still need to discuss availability realistically. A common architecture is a single-writer core ledger (or a strongly consistent cluster) that processes all withdrawals for an account. This avoids split-brain balances and simplifies ACID guarantees. The trade-off is that multi-region active-active writes are difficult without heavy coordination.

For global banks, the common compromise is: route transactions to the “home region” for the account or use a partitioned ledger where each account has a single authoritative shard. Reads can be served from replicas, but writes must go to the authoritative shard. If a region is down, you may deny withdrawals rather than risk inconsistency. That’s a business decision, and in interviews, you should state it clearly.

You can also mention limited offline modes (allowing small withdrawals without online authorization), but those are risky and heavily constrained. Most interview settings assume online authorization is required.

Single-writer ledger

Strong

Lower during regional outages

Common for correctness

Multi-region active-active

Hard to guarantee

Higher

Rare, complex

Read replicas for balance

Mostly strong (bounded staleness)

Higher

Common optimization

Offline withdrawals

Weak

Higher

Rare, tightly limited

Common pitfall: Saying “active-active multi-region with strong consistency” without explaining coordination and failure modes. Interviewers will push on split brain and double-dispense risk.

Summary (after the explanation):

  • Prefer single-writer or strongly consistent ledger for withdrawals.

  • Use replicas for reads, but keep writes authoritative.

  • Be explicit about the business choice: deny vs risk inconsistency.

Reliability, observability, and auditability#

ATM systems must be observable and auditable because disputes are inevitable. Customers will claim “I didn’t get the cash,” or “I was charged twice,” and regulators will demand traceability. That means every step must produce immutable logs: requests, responses, holds, dispense commands, sensor confirmations, reversals, and reconciliation outcomes.

Observability also protects operations. You monitor ATM health (cash levels, dispenser errors), backend latency, authorization failure rates, fraud blocks, and reconciliation queue sizes. For the ledger and TPS, you monitor lock contention, transaction timeouts, and idempotency hit rates (retries). For the network, you monitor routing failures and third-party network availability.

A strong answer includes SLOs that reflect user experience and safety: p95 authorization latency, dispense-confirm latency, reconciliation completion time, and fraud false positive rate. You also want “audit completeness” metrics: every committed debit should have a corresponding dispense confirmation or a reconciliation decision.

p95 auth latency

Customer wait time

< 2–3 seconds

Withdrawal success rate

Core availability

> 99.9% (excluding fraud declines)

Reconciliation backlog

Risk of unresolved disputes

Near zero steady-state

Idempotency replay rate

Network instability indicator

Track and alert on spikes

Dispense failure rate

Hardware reliability

Alert per ATM model/region

Fraud block rate + FP rate

Safety vs customer friction

Monitor drift over time

Interview signal: “Immutable logs + reconciliation jobs + dispute workflows” is what turns your design into a bank-grade system rather than a demo.

Summary (after the explanation):

  • Log every step immutably for audit and disputes.

  • Monitor both software and hardware health.

  • Track reconciliation backlog and idempotency retries as key safety signals.

How interviewers evaluate your ATM design#

Interviewers are looking for more than a component diagram. They want to see that you prioritize correctness under ambiguity and that you understand why hardware changes transaction semantics. Your strongest signals are: using holds, committing only after dispense confirmation, modeling a state machine, and having reconciliation logic for uncertain outcomes.

They also expect you to take security seriously without drifting into vague statements. Mention HSMs for PIN verification, encryption in transit, PCI compliance constraints, and strict access controls. For foreign cards, they expect you to describe authorization vs settlement and external network dependencies.

Finally, they want trade-offs. If you claim everything is strongly consistent and highly available across regions, they’ll ask how you avoid double withdrawals during partitions. A strong answer chooses safety and explains why.

What strong answers sound like: “I treat withdrawal as a state machine with holds and commit-after-dispense. I design for at-least-once messaging with idempotency, and I reconcile ambiguous outcomes using device journals and immutable logs.”

Summary (after the explanation):

  • Lead with correctness: holds, commit-after-dispense, reconciliation.

  • Show security depth: HSM, encryption, compliance.

  • Explain interbank routing and settlement phases.

  • Make realistic consistency/availability trade-offs.

Final takeaway#

ATM System Design forces you to design a distributed system where mistakes are expensive and visible. The right approach is conservative: strong ledger correctness, idempotent transaction handling, explicit state machines, and safe failure recovery when hardware is involved. If you can explain why dispense confirmation gates commit, how reversals and reconciliation work, and how you audit every step for disputes, you’ll give a Staff-level answer.

Happy learning!


Written By:
Zarish Khalid