ATM system design is the practice of architecting a distributed system where a physical edge device (the ATM) coordinates with a centralized core banking ledger to execute financial transactions that must remain correct even when hardware jams, networks drop, or power fails mid-operation. The central challenge is ensuring that cash is never dispensed without a corresponding account debit and that a customer is never debited without actually receiving cash, making it one of the purest tests of transaction correctness under real-world ambiguity.
Key takeaways
- Ledger-first architecture: Every financial operation must produce immutable, auditable ledger entries rather than simple in-place balance updates.
- Commit-after-dispense: The final account debit is committed only after the ATM’s hardware sensors confirm that cash was physically presented to the customer.
- State machine modeling: Withdrawals must be modeled as an explicit, persisted state machine with well-defined transitions for holds, dispense attempts, confirmations, reversals, and reconciliation.
- Authorization before settlement: Interbank (foreign card) transactions require synchronous authorization from the issuing bank while actual fund movement settles asynchronously in batch cycles.
- Reconciliation as a core concern: Ambiguous outcomes (timeouts, lost confirmations) are resolved through automated reconciliation jobs that cross-reference device journals and dispenser counters, not through guessing.
Most engineers think of an ATM as a vending machine for cash. Insert card, punch in a PIN, grab the bills, walk away. But beneath that 30-second interaction sits one of the most unforgiving distributed systems you will ever design. The ATM touches a core financial ledger, relies on mechanical hardware that can jam or lose power, communicates over networks that drop packets, and must interoperate with external card networks spanning multiple countries. Get any of these interactions wrong, and real money vanishes from a customer’s account or materializes from thin air. That makes ATM system design one of the sharpest tests of engineering judgment in any system design interview, and an excellent lens for understanding how safety-critical distributed systems actually work.
Why ATM design is different from typical web systems#
In most consumer-facing applications, eventual consistency is a perfectly reasonable trade-off. A social media feed that is a few seconds stale causes no lasting harm. A dropped API call can be retried transparently. But ATM transactions operate in a domain where “retry later” can mean a customer loses $500 or a bank hemorrhages cash through a jammed dispenser that the backend thinks succeeded.
The fundamental difference is the presence of a physical actuator. When a web service writes a row to a database, the write either succeeds or it does not. When an ATM commands its cash dispenser to eject five $20 bills, the outcome is uncertain. The motor might stall. The bills might jam. The power might cut before the sensors register delivery. Your backend cannot directly observe the physical world. It can only receive a confirmation signal, and that signal can be delayed, lost, or wrong.
This hardware-driven uncertainty is what transforms a seemingly simple CRUD operation into a multi-phase protocol with explicit failure states. Every design decision, from how you model accounts to how you handle timeouts, must account for the gap between “the system told the machine to dispense” and “the customer actually received cash.”
Real-world context: In 2012, a software glitch at a major U.S. bank caused ATMs to dispense cash without debiting accounts for several hours. The root cause was a failure in the hold-and-confirm workflow, exactly the kind of gap this blog addresses.
Understanding why ATMs demand this level of rigor is the first step. The next is knowing how to scope the problem correctly in an interview setting.
Clarifying requirements the way interviewers expect#
A strong ATM design answer starts the same way every good system design answer starts: by locking scope and defining invariants before drawing a single box. Interviewers use this phase to gauge whether you understand what matters and what can be deferred.
Functional scope#
You should explicitly confirm which operations are in play. The core set almost always includes:
- Cash withdrawal: The primary use case and the one that stresses correctness the most.
- Balance inquiry: A read-heavy operation that still requires authentication and bounded staleness.
- Deposits: Worth mentioning briefly, but deposits typically settle asynchronously (the bank verifies the envelope or scanned check later), so they are less interesting from a real-time correctness standpoint.
- Receipt printing and mini-statements: These are secondary features that can degrade gracefully without financial risk.
- Foreign card support: This introduces interbank routing and external dependencies, adding a meaningful layer of complexity.
System invariants#
Once scope is clear, state the two invariants that govern every design choice:
- A customer must never lose money without receiving cash (no phantom debits).
- The bank must never dispense cash without debiting the account (no free money).
These sound symmetric, but they are not equally easy to enforce. The first is more dangerous to the customer. The second is more dangerous to the bank. A conservative system prioritizes the customer’s safety and relies on reconciliation to protect the bank.
Non-functional requirements#
Beyond correctness, call out the constraints that shape the architecture:
- PCI DSS compliance for handling card data and PINs.
- HSM-backed PIN verification (PINs must never exist in plaintext outside tamper-resistant hardware).
- Strict auditability with immutable logs for every transaction step.
- Conservative failure handling that prefers denying a transaction over risking an inconsistent outcome.
Attention: A common interview mistake is claiming “five nines availability everywhere.” A stronger answer identifies which components must be strongly consistent (the ledger) and which can tolerate graceful degradation (receipt printing, UI animations).
With scope and invariants locked, the next step is mapping out the components that make this system work.
High-level architecture and component responsibilities#
An ATM system is a layered network with a clear separation between the edge (physical devices), the routing plane (network switching), and the money plane (transaction processing and the core ledger). This separation is intentional and deeply important. The edge handles human interaction and hardware. The routing plane handles connectivity and message forwarding. The money plane is the single source of financial truth.
The edge layer: ATM as a constrained client#
The ATM itself is a specialized embedded client with several hardware peripherals: a card reader, a PIN pad (which encrypts the PIN at the point of entry), a cash dispenser with motor and sensors, a receipt printer, and a screen. Critically, the ATM does not decide financial truth. It requests authorization from the backend and follows instructions. If the backend says “dispense $200,” the ATM attempts to do so and reports the outcome. If it cannot confirm the outcome, it reports ambiguity.
The ATM runs local software that manages device state, enforces timeouts, and maintains a
The routing plane: ATM switch#
Between the ATMs and the backend sits the
The money plane: TPS and core ledger#
Behind the switch, the
Pro tip: In your interview, explicitly separate the routing plane from the money plane. Strong candidates never allow the ATM to directly mutate balances. The ATM requests; the TPS decides; the ledger records.
The architecture defines where each responsibility lives. But to truly understand correctness, you need to look at how the data model enforces it.
Data model and ledger-first thinking#
One of the most common mistakes in ATM system design interviews is modeling an account as a row with a “balance” field that gets incremented or decremented. This approach is fragile, unauditable, and dangerous when concurrent operations occur. The correct approach is ledger-first: the balance is a derived value computed from an immutable sequence of ledger entries.
Ledger entries over balance fields#
Every financial action produces one or more ledger entries. A withdrawal creates a debit entry against the customer’s account and a corresponding credit entry (to the bank’s cash-on-hand or ATM float account, depending on the accounting model). These entries are append-only and immutable. The “balance” is the sum of all entries for an account, or more practically, a cached value that is updated atomically alongside each new entry.
This design gives you a complete, tamper-evident audit trail. If a dispute arises, you can reconstruct exactly what happened and when. If a bug causes an incorrect debit, you can trace the exact entry and its triggering event.
Holds: reserving funds before dispensing#
For ATM withdrawals, the data model must support
The available balance formula becomes:
$$\\text{Available Balance} = \\text{Current Balance} - \\sum \\text{Active Holds}$$
This ensures that if a customer has $500 and requests $200, the hold reduces the available balance to $300 immediately, preventing a second concurrent withdrawal from overdrawing the account.
Idempotency keys#
The ATM generates a stable transaction_id for each withdrawal attempt. This ID is carried across retries. If the network drops after the TPS processes the request but before the ATM receives the response, the ATM retries with the same transaction_id. The TPS detects the duplicate and returns the original result instead of creating a second hold or debit. This is the core mechanism of
Balance-as-a-Field vs. Ledger-First Data Models
Aspect | Balance-as-a-Field | Ledger-First |
Approach | Stores current balance directly in the account record; updated with each transaction | Records every transaction as an immutable ledger entry; balance is derived by aggregation |
Auditability | Weak — historical balance changes are not inherently preserved | Strong — append-only structure provides a complete, verifiable transaction trail |
Concurrency Safety | Complex — simultaneous updates risk race conditions, requiring robust locking mechanisms | Natural — appending new entries avoids modifying existing records, minimizing conflicts |
Dispute Resolution | Difficult — limited transaction history complicates investigation of discrepancies | Efficient — full immutable record allows every transaction to be traced and verified |
Complexity | Low initial complexity, but data integrity challenges grow in concurrent environments | Higher implementation complexity, but offers greater scalability and long-term reliability |
Historical note: Double-entry bookkeeping, the foundation of modern ledger design, dates back to 15th-century Italian merchants. ATM systems inherit this principle directly. Every debit has a corresponding credit, and the books must always balance.
With the data model established, let’s walk through the simplest transaction first: a balance inquiry.
Walk-through 1: Balance inquiry step by step#
Balance inquiry is the simpler of the two core operations, but it still demonstrates important design principles around authentication, session management, and read consistency.
The flow begins when the customer inserts their card. The ATM reads the card data (track data or chip) and prompts for a PIN. The PIN is encrypted immediately on the PIN pad using a key injected from an
After successful authentication, the ATM sends a balance inquiry request with the authenticated session token and account identifier to the TPS. The TPS performs several checks:
- Session validation: Is the session token valid and not expired?
- Account status: Is the account active, not frozen, and not flagged?
- Balance computation: Fetch the available balance (current balance minus active holds) and the current (or “ledger”) balance.
The TPS returns both values. The distinction matters: if a customer has a $100 hold from a pending gas station authorization, their current balance might show $600 while their available balance shows $500. Displaying only one number leads to confusion and support calls.
Real-world context: Most major banks now display both “available” and “current” balances on ATM screens and in mobile apps. This transparency reduces disputes caused by holds from hotels, gas stations, and rental car agencies.
The read path for balance inquiry should be strongly consistent or at most use a read replica with bounded staleness. Financial reads that show a wildly incorrect balance (for example, omitting a withdrawal made seconds ago) erode customer trust and can cause overdraft decisions. The safe default is to read from the primary or from a replica that is guaranteed to be within a few seconds of the primary.
Balance inquiry is read-only and comparatively safe. The real complexity begins when cash must leave the machine, and that is where the withdrawal workflow earns its reputation.
Walk-through 2: Cash withdrawal step by step#
Cash withdrawal is the heart of ATM system design. It combines a ledger mutation with a physical actuation, and neither the backend nor the ATM has full certainty about the other’s state at every moment. The design insight that separates strong answers from weak ones is this: you must never commit the final debit until you have strong evidence that cash was dispensed, and you must never dispense cash without first reserving funds.
The withdrawal sequence#
The flow proceeds through clearly ordered phases:
Phase 1: Authentication. Identical to balance inquiry. Card read, PIN encryption, HSM verification, session establishment.
Phase 2: Authorization and hold. The customer selects a withdrawal amount. The TPS validates the request against multiple constraints:
- Available balance (must be sufficient after holds).
- Per-transaction limits (ATM-specific and account-specific).
- Daily withdrawal limits (cumulative for the day).
- Fraud signals (velocity checks, geo-anomalies, which we will cover later).
If all checks pass, the TPS creates a hold for the requested amount, reducing the available balance immediately. It then returns a “dispense authorization” message to the ATM, including the transaction_id and the approved amount.
Phase 3: Cash dispensing. The ATM commands its cash dispenser to eject the specified bills. The dispenser’s internal sensors count the bills as they pass through the mechanism. If the correct number of bills is detected at the exit slot, the ATM reports a successful dispense. If the count is wrong, the dispenser attempts to retract the bills and reports a failure or partial dispense.
Phase 4: Dispense confirmation. Only after the ATM’s sensors confirm that cash was presented (and ideally taken by the customer), does the ATM send a “dispense confirmed” message back to the TPS. The TPS then converts the hold into a posted debit entry on the ledger. The transaction is now committed.
Phase 5: Reversal path. If the dispenser jams, the ATM loses power, or the confirmation message never arrives, the TPS does not commit. Instead, the transaction enters a “needs reconciliation” state. After a timeout, if no confirmation arrives, the hold is reversed (or the reconciliation process takes over). The customer’s funds are released.
Attention: Never treat a timeout as a definitive failure. Cash might have been dispensed just before the network dropped. Immediately reversing the hold in this case would give the customer free money. The correct response to ambiguity is reconciliation, not assumption.
The withdrawal sequence is clear in the happy path. But real systems must handle every path, and that is where the state machine becomes essential.
Withdrawal state machine and reconciliation logic#
The withdrawal state machine is where you demonstrate staff-level rigor. A state machine makes every possible transaction outcome explicit, auditable, and recoverable. It eliminates the ambiguity of ad-hoc if/else chains and gives operations teams a clear vocabulary for diagnosing problems.
State transitions#
The machine begins after authentication when the customer requests a withdrawal. Each transition corresponds to a specific event:
- INITIATED → Customer selected amount. TPS created a transaction record with a stable
txn_id. - FUNDS_HELD → TPS placed a hold on the account. Available balance reduced.
- DISPENSEINPROGRESS → ATM received dispense authorization and commanded the hardware.
- DISPENSE_CONFIRMED → ATM sensors confirmed cash was presented. ATM sent confirmation to TPS.
- COMMITTED → TPS converted the hold to a posted debit. Transaction is final.
The reversal path branches from multiple states:
- From FUNDS_HELD: If the ATM reports a dispense failure (jam, retraction), the TPS moves to REVERSED and releases the hold.
- From DISPENSEINPROGRESS: If a timeout occurs without confirmation, the TPS moves to RECONCILE_PENDING rather than guessing.
- From RECONCILE_PENDING: The reconciliation process determines the outcome and moves to either COMMITTED or REVERSED.
Why commit-after-dispense matters#
This ordering is the single most important design decision in the entire system. Consider the alternatives:
- Commit before dispense: If the dispenser jams, the customer is debited without receiving cash. This is the worst-case outcome and violates invariant #1.
- Dispense before hold: If the hold fails (insufficient funds discovered after a race condition), the bank gives away cash it cannot recover. This violates invariant #2.
The hold-then-dispense-then-commit ordering is the only safe sequence. The hold protects the bank. The commit-after-dispense protects the customer. The reconciliation process handles the gray zone in between.
Reconciliation: the safety net for ambiguity#
Reconciliation is not a cleanup job that runs “if something goes wrong.” It is a core, continuously operating subsystem. Its job is to resolve every transaction that did not reach a terminal state (COMMITTED or REVERSED) within the expected time window.
The reconciliation process cross-references multiple evidence sources:
- TPS transaction log: What state did the TPS last record?
- ATM device journal: What did the ATM’s local sensors observe?
- Dispenser counter: How many bills did the dispenser actually eject? This is reconciled against the physical cash remaining in the cassette.
- Late-arriving confirmations: Network messages that were delayed but eventually arrive.
If the evidence shows cash was dispensed, the reconciliation job commits the debit. If the evidence shows cash was not dispensed (bills retracted or never ejected), it reverses the hold. If the evidence is inconclusive, it escalates to a manual dispute workflow where a human reviews camera footage and device logs.
Pro tip: In your interview, mention that the reconciliation job runs on a schedule (e.g., every few minutes for recent transactions) and also at end-of-day when the ATM’s physical cash count is reconciled against the expected count based on all committed transactions. This demonstrates operational maturity.
Withdrawal State Machine Transitions Table
Current State | Event | Next State | Action Taken | Timeout Behavior |
Idle | Card Inserted | Card Inserted | Read card data, prompt for PIN | N/A |
Card Inserted | PIN Entered | PIN Verification | Validate PIN against account | Return to Idle after 30s of inactivity |
PIN Verification | PIN Accepted | Amount Selection | Display account balance, prompt for amount | Return to Idle after 30s of inactivity |
PIN Verification | PIN Rejected | Card Inserted | Display error, re-prompt for PIN | Return to Idle after 30s of inactivity |
Amount Selection | Amount Confirmed | Processing | Deduct amount, initiate transaction | Return to Idle after 60s of inactivity |
Processing | Transaction Approved | Dispensing | Prepare cash for dispensing | Revert transaction, return to Idle after 30s |
Dispensing | Cash Collected | Completed | Eject card, print receipt | Retract cash, return to Idle after 30s |
Completed | Session Ended | Idle | Clear session data, reset machine | Return to Idle after 15s |
Transaction correctness within a single bank is challenging enough. When foreign cards enter the picture, the complexity multiplies.
Interbank routing and settlement#
When a customer uses Bank B’s card at Bank A’s ATM, the transaction crosses organizational boundaries. Bank A is the
Authorization: synchronous and blocking#
The authorization flow for a foreign card adds network hops but follows the same logical pattern. The ATM sends the withdrawal request to Bank A’s switch. The switch recognizes the card’s BIN (Bank Identification Number) as belonging to an external network and forwards the authorization request through the card network to Bank B. Bank B’s authorization system checks the customer’s balance, applies fraud rules, and returns an approve or decline decision. This entire round trip must complete while the customer stands at the ATM, typically within 5 to 15 seconds.
The TPS at Bank A cannot place a hold on Bank B’s ledger. Instead, it relies on Bank B’s authorization response as a promise that the funds are reserved. Bank A’s TPS then proceeds with the dispense workflow locally. If the dispense succeeds, Bank A has disbursed physical cash and holds an authorization approval from Bank B as its claim for reimbursement.
Settlement: asynchronous and batched#
Actual money movement between Bank A and Bank B does not happen in real time. Settlement occurs in batch cycles, often daily, through the card network’s clearing and settlement process. Bank A submits a settlement file listing all foreign-card transactions. The card network nets the amounts across all participating banks and facilitates the fund transfers. Interchange fees (the fee Bank A earns for providing the ATM service) are calculated and deducted during this process.
Real-world context: The global ATM interbank network processes billions of transactions annually. Standards like ISO 8583 define the message format for financial transaction requests and responses, ensuring interoperability across thousands of banks and dozens of card networks.
Failure handling across network boundaries#
Network dependencies change failure handling significantly. If the card network is unreachable, Bank A cannot obtain authorization for foreign cards. The correct response is to decline the transaction. Some systems support fallback routing (trying a secondary network), but dispensing cash without issuer authorization is almost never acceptable because Bank A would have no guarantee of reimbursement.
Authorization and settlement handle the money flow. But protecting that flow against abuse requires an entirely separate layer of defense.
Fraud and abuse prevention#
ATM fraud operates on two fronts. Digital attacks attempt to exploit stolen credentials, brute-force PINs, or replay intercepted authorization messages. Physical attacks involve card skimmers, ATM tampering, and coordinated “cash-out” operations where criminals use hundreds of cloned cards simultaneously. A robust design layers multiple defenses so that no single control point is a single point of failure.
Velocity and limit controls#
The first layer is simple but effective. Enforce hard limits at multiple granularities:
- Per-transaction limit: No single withdrawal exceeds $500 (or the bank’s configured maximum).
- Per-card daily limit: Cumulative withdrawals across all ATMs cannot exceed $1,000 per day.
- Per-ATM anomaly threshold: If a single ATM processes an unusual volume of transactions in a short window, flag it for review.
These limits prevent the most common cash-out attacks, where criminals attempt to drain accounts as fast as possible before the fraud is detected.
Risk scoring and behavioral signals#
Beyond static limits, a centralized risk engine evaluates each transaction against behavioral patterns:
- Geo-velocity: A withdrawal in New York followed by one in London 30 minutes later is physically impossible and should trigger a decline or step-up verification.
- Pattern anomalies: A customer who normally withdraws $100 weekly suddenly requesting $500 at 3 AM from an unfamiliar ATM.
- Repeated declines: Multiple failed PIN attempts across different ATMs in a short window suggest a stolen card being tested.
The risk engine returns one of three decisions: allow, decline, or step-up (request additional verification). ATMs have limited step-up options compared to online channels, but some support SMS verification or temporary blocks that require a phone call to the bank.
Circuit breakers for systemic attacks#
When an attack is detected at scale (for example, a batch of cloned cards hitting ATMs across a region), the system needs circuit breakers. These can:
- Disable withdrawals at compromised ATMs.
- Reduce withdrawal limits across a region.
- Require online-only authorization (disabling any offline fallback).
- Alert operations teams for manual intervention.
Attention: Fraud controls must balance security with customer experience. An overly aggressive risk engine that declines legitimate transactions (high false positive rate) drives customers away. Track false positive rates as carefully as you track fraud losses.
Fraud Prevention Layers Comparison
Layer | Detection Speed | False Positive Risk | Protection Scope | Implementation Complexity |
Velocity Limits | High (~3ms) | Moderate | Limited – best for repetitive, automated fraud | Low |
Risk Scoring | Moderate (~34ms) | Low | Broad – handles complex, evolving fraud patterns | High |
Circuit Breakers | High (immediate) | Variable | Moderate – effective against large-scale attacks | Moderate |
Fraud controls protect against malicious actors. But the system must also survive its own infrastructure failures, which brings us to the fundamental distributed systems trade-off.
Consistency vs. availability: realistic trade-offs#
ATM systems live firmly on the consistency side of the CAP theorem spectrum for ledger writes. Dispensing cash based on a stale or partitioned view of the balance is unacceptable. But availability still matters because an ATM network that is constantly declining legitimate transactions loses customer trust and revenue.
Single-writer ledger architecture#
The most common and safest approach is a single-writer (or strongly consistent) core ledger. Each account has one authoritative location where writes occur. This eliminates split-brain scenarios where two partitions simultaneously authorize a withdrawal against the same balance, potentially overdrawing the account.
For a single-region bank, this is straightforward: one primary database (or a strongly consistent cluster using synchronous replication) handles all ledger writes. Reads for balance inquiries can be served from synchronous replicas with minimal staleness.
Multi-region considerations#
For global banks, the challenge intensifies. True active-active multi-region writes with strong consistency require distributed consensus protocols (like Paxos or Raft) with cross-region latency, which can push authorization response times beyond acceptable thresholds.
The practical compromise is account-level partitioning: each account is assigned to a “home region” that is authoritative for that account’s ledger. Transactions on that account, regardless of where the ATM is located, are routed to the home region. This ensures strong consistency per account while distributing load across regions.
If the home region is unreachable, the bank faces a binary choice:
- Deny the withdrawal and accept the availability hit.
- Allow a limited withdrawal under conservative constraints (low amount, flagged for immediate reconciliation) and accept the consistency risk.
Most banks choose to deny. The risk of double-dispensing or overdrawing far outweighs the inconvenience of a temporary decline. In your interview, state this trade-off explicitly and justify your choice.
Historical note: Before reliable wide-area networks, some ATMs operated in “offline mode,” allowing small withdrawals based on locally cached authorization rules. This practice has largely been phased out due to fraud risk, but some markets still use it for low-value transactions in areas with unreliable connectivity.
The system’s correctness guarantees are only as trustworthy as your ability to verify them, which is why observability and auditability are not afterthoughts.
Reliability, observability, and auditability#
Disputes are inevitable in any ATM network. A customer will claim they did not receive cash. A reconciliation job will find a discrepancy. A regulator will request a transaction trace. The system must produce enough evidence to resolve every one of these cases definitively.
Immutable audit logging#
Every step in the transaction life cycle must generate an immutable log entry: authentication attempts, authorization decisions, hold creations, dispense commands, sensor readings, confirmation messages, reversals, and reconciliation outcomes. These logs must be:
- Append-only: No log entry can be modified or deleted.
- Timestamped: With synchronized clocks (NTP at minimum) across all components.
- Correlated: Every log entry includes the
txn_idso that a complete transaction trace can be reconstructed.
This is not just good engineering. It is a regulatory requirement under frameworks like PCI DSS and banking audit standards.
Operational monitoring#
Beyond audit logs, the operations team needs real-time visibility into system health across both software and hardware dimensions:
- ATM hardware health: Cash levels in each cassette, dispenser error rates, card reader jams, printer paper status.
- Backend performance: Authorization latency (p50, p95, p99), TPS throughput, ledger write latency, lock contention.
- Reconciliation health: Queue depth of RECONCILE_PENDING transactions, average resolution time, escalation rate to manual review.
- Fraud metrics: Block rate, false positive rate, velocity limit triggers per region.
- Network health: Card network response times, routing failures, timeout rates for interbank requests.
Service level objectives#
Define SLOs that reflect both user experience and safety:
- Authorization latency: p95 under 3 seconds for domestic, p95 under 8 seconds for foreign cards.
- Dispense-to-confirm latency: p95 under 10 seconds (this is hardware-dependent).
- Reconciliation resolution time: 95% of ambiguous transactions resolved within 4 hours.
- Audit completeness: 100% of committed debits must have a corresponding dispense confirmation or a reconciliation decision record.
Pro tip: The “audit completeness” metric is your strongest signal of system correctness. If the count of committed debits ever exceeds the count of confirmed dispenses plus reconciliation-resolved transactions, something is fundamentally broken. Alert on this immediately.
With observability in place, you have the tools to verify your system’s behavior in production. Let’s step back and consider how interviewers evaluate the complete picture.
How interviewers evaluate your ATM design#
ATM system design interviews are not about drawing the most boxes on a whiteboard. They are about demonstrating that you understand why this system is hard and how to make it safe. Interviewers are evaluating several specific signals.
Correctness under ambiguity#
The single strongest signal is your withdrawal model. If you describe holds, commit-after-dispense, and a state machine with reconciliation, you have demonstrated that you understand hardware-driven transaction semantics. If you treat withdrawal as a single database update, you have missed the core challenge.
Security depth#
Mention HSMs for PIN verification. Explain that PINs are encrypted at the edge and never decrypted outside tamper-resistant hardware. Reference PCI DSS as a constraint on how you handle card data. These details show that you treat security as a structural requirement, not a checkbox.
Interbank awareness#
Describe the difference between authorization (synchronous, real-time) and settlement (asynchronous, batched). Explain the acquirer-issuer-network relationship. Mention ISO 8583 message standards if you know them. This demonstrates awareness of how real payment rails work.
Honest trade-offs#
If you claim active-active multi-region with strong consistency and no downside, the interviewer will challenge you on split-brain and double-dispense risk. A strong answer acknowledges the trade-off, chooses safety (deny during partition), and explains why.
The following table summarizes what separates different levels of interview performance:
Interview Evaluation Rubric
Signal Area | Weak Answer | Strong Answer |
Withdrawal Model | Limited understanding; fails to address transaction validation, error handling, or user authentication | Comprehensive explanation covering transaction validation, error handling, authentication, and security considerations |
Data Model | Vague or incorrect descriptions of data structures, relationships, or normalization techniques | Deep understanding of data structures, relationships, normalization, and design choices justified by scalability and performance |
Failure Handling | Superficial or generic strategies; lacks awareness of specific system vulnerabilities or failure scenarios | Thorough approach identifying failure points, robust error detection and recovery mechanisms, and system resilience strategies |
Security | Neglects common threats, encryption methods, or access control measures | Strong grasp of security principles including threat identification, encryption techniques, and access controls |
Interbank | Limited understanding of protocols, compliance requirements, or interbank transaction complexities | In-depth explanation of communication protocols, compliance standards, settlement procedures, and interbank challenges |
Trade-offs | Unable to identify trade-offs; decisions lack consideration of performance, scalability, or maintainability | Thoughtfully evaluates trade-offs across performance, scalability, maintainability, and security with clear justifications |
Real-world context: Staff and principal engineers at major banks spend months designing and reviewing exactly the kind of state machine and reconciliation logic described here. Demonstrating this thinking in a 45-minute interview is a strong signal of senior-level judgment.
Conclusion#
ATM system design is a masterclass in building distributed systems where the cost of a bug is measured in real money, not just error rates. The three ideas that matter most are ledger-first data modeling that ensures every transaction is auditable and immutable, the commit-after-dispense protocol that prevents phantom debits when hardware behaves unpredictably, and reconciliation as a core subsystem that resolves ambiguity through evidence rather than guesswork.
Looking ahead, ATM systems are evolving. Cardless withdrawals using mobile pre-staging, biometric authentication replacing PINs, and real-time payment networks reducing settlement latency are all reshaping the architecture. But the core principles remain unchanged: when physical cash is involved, you design for safety first and optimize for speed second.
If you can explain why a dispense confirmation gates a ledger commit, how a state machine makes failure modes explicit, and why reconciliation is not an afterthought but a core service, you are not just answering an interview question. You are demonstrating the kind of judgment that builds systems people trust with their money.