Imagine you’re shopping online and you’ve just found the perfect item.
You click “Place Order.”
The page freezes.
Nothing seems to happen.
Out of frustration, you click again. Finally, five minutes later, your inbox pings with confirmation emails—not one, but two identical orders.
It’s a nightmare for you as a customer, but an even bigger nightmare for the developers running that system. Why? Because what just happened isn’t a UI glitch, it’s a classic distributed systems problem that exposes a fundamental weakness in the system’s design. The developers now have to clean up a complex and difficult-to-trace mess.
Here’s the chain of events that created this chaos for the developers:
The initial request from the client successfully reached the server, but the network was slow, so the client timed out and assumed the request failed.
The client, either automatically or because the customer retried, sent a second request.
Crucially, the server could not know this was a duplicate. It processed the second request as if it were brand new, leading to duplicate orders and an inconsistent database state.
This may look like a rare edge case, but in modern distributed systems, especially those handling payments, bookings, or inventory, it’s a common and inevitable problem that developers must actively design a solution for.
Duplicate requests create a ripple effect that impacts both businesses and developers.
For businesses, duplicates can quickly snowball:
Double-billing: Customers may be charged twice for an order they placed once.
Inventory mismatches: Limited stock gets allocated twice.
Operational overhead: Customer support must manually clean up mistakes.
Damage to brand reputation: A single bad experience erodes user trust.
From a developer’s perspective, duplicates are just as painful:
Debugging chaos: Knowing which request is legitimate isn’t easy.
Fire drills in production: Duplicate entries often cascade, breaking other system parts.
Loss of confidence: Users—and sometimes other internal teams—see your system as unreliable.
These problems aren’t rare or one-time events. They happen frequently because of common issues like a slow internet connection, systems automatically retrying a request, or a user clicking a button multiple times.
In this newsletter, we’ll discuss:
What idempotency is and why it’s essential in modern distributed systems.
How it eliminates duplicate actions in critical workflows like payments, bookings, and order processing.
Proven strategies for implementing idempotency, including keys, database constraints, conditional writes, and state machines.
Best practices for designing reliable APIs and workflows that stay consistent despite failures and retries.
A well-established engineering solution to this entire class of problems is idempotency.
Idempotency guarantees that repeated requests have the same outcome as a single request. In other words, no matter how often a user clicks “submit” or a client retries the same order, the system will execute the operation once and only once. This principle transforms fragile, noisy systems into predictable and stable ones.
Picture yourself waiting for an elevator. You press the button once. When the elevator doesn’t immediately arrive, you press it repeatedly.
The first press registers the request.
Additional presses don’t make the elevator come faster. They are acknowledged, but they don’t change the outcome.
This is precisely how an idempotent system behaves.
When a network connection is slow or a server’s response is delayed, it can create a tricky situation. The client, whether a browser, a mobile app, or a service like an API gateway, has a time-out setting. If it doesn’t receive a response within that time, it assumes the request failed. The client often automatically retries the same request to ensure the operation goes through. This behavior, while helpful for fault tolerance, can lead to serious issues if not handled correctly by the server.
In a system not designed with idempotency in mind, these retries can cause unintended side effects. The server cannot know that the second request duplicates a previously successful one. Below are the steps that cause duplication:
The first request successfully creates the order: The server receives and processes the initial request, updates its database, processes a payment, and creates a new order record. However, due to the network delay, the success response doesn’t reach the client before the timeout.
The second retry is treated as a new request: Unaware of the first success, the client sends an identical request. The server, lacking an idempotency mechanism, receives this request and processes it as if it were the first time. It sees a valid request to create an order and proceeds with its business logic.
Resulting in two separate orders, payments, or bookings: The system now has two distinct order records, two separate charges on the customer’s credit card, and two bookings for the same limited item. This creates an inconsistent state and leads to inventory mismatches, double billing, and a poor user experience, requiring expensive and time-consuming manual intervention.
This sequence diagram shows what happens: the second request slips through because the server does not recognize it as a duplicate.
When idempotency is applied, the server enforces strict rules that prevent duplicate processing.
Here’s how it works in this diagram:
The first request successfully writes the order with id: 1.
When the second retry arrives with the same id: 1, the database checks for uniqueness (using a conditional write).
Since the order already exists, the system safely rejects the duplicate and returns a consistent response to the client.
This makes the system safe to retry even during slow networks or outages.
Now, how do you make your system idempotent? Let’s explore four core approaches.
One of the most popular methods is to use idempotency keys.
The client generates a unique key for each request: This unique key (often called an idempotency key) is the foundation of the entire process. It’s the client’s responsibility to ensure a unique key is used for each distinct request, but the same key is reused for any retries of that request
The server stores the key along with the response: This is the server’s state management. The server executes the operation for the first successful request, then stores the idempotency key and the resulting response in a persistent store. This record confirms that the operation was completed.
If the server receives the same key again, it knows it’s a retry and simply returns the stored result without reprocessing: This is the key to the idempotency guarantee. When a subsequent request arrives with a key already in the server’s store, it immediately fetches the stored response and sends it back to the client. It completely bypasses the business logic, thus preventing any side effects like double-charging a credit card or allocating an item twice.
Payment APIs like Stripe widely use this approach.
Databases are great allies in enforcing idempotency:
Use primary keys or unique indexes to prevent duplicates: Databases provide a powerful, built-in mechanism to enforce the principle of idempotency. By defining a primary key or a unique index on a column, you create a rule that a specific value cannot exist more than once in that column. This is a foundational aspect of relational database integrity.
Automatic rejection of duplicate requests: When a duplicate request arrives, your application will attempt to perform the same operation again, such as inserting a record with a previously used unique key. The database’s integrity constraint immediately recognizes the violation and atomically rejects the INSERT operation, returning a specific error (e.g., a unique constraint violation).
Your application’s code is responsible for catching this database error. Instead of treating it as a fatal failure, the application should recognize that this specific error means the record already exists from the first successful request. It can then safely return a success response to the client, without re-running the core business logic.
Unlike the unique constraints of a relational database, some NoSQL and distributed databases, such as Amazon DynamoDB and Azure Cosmos DB, offer native support for conditional writes. This feature allows you to perform an atomic write operation (e.g., creating or updating a record) only if a specified condition is met. This single-operation approach prevents race conditions that could occur with separate “check-then-write” application logic.
The core of this method is the condition itself. To enforce idempotency, the condition is typically expressed as: “Perform this write only if a record with this unique key does not already exist.”
This ensures that retries do not overwrite or create duplicates.
This approach is highly effective for more complex workflows than a single API call and cannot rely on a simple database constraint. It ensures that a retry doesn’t duplicate the work of the entire process, only the parts that truly need to be rerun.
Instead of treating an entire operation as a single unit, a state machine breaks it down into distinct, sequential states or checkpoints. Think of this like a checklist for a complex task. For example, an order fulfillment process might have states like:
Check inventory.
Process the payment.
Send the order to the warehouse.
Notify the customer.
Each of these states is an atomic operation within the larger workflow. This structured approach allows the system to clearly understand its current position.
The key to achieving idempotency here is the state machine’s ability to persist its state. After each successful state completes, the state machine records its progress.
If a system crashes or a retry is initiated, the state machine doesn’t start over from the beginning. It first checks its saved state and resumes the workflow from the last successful checkpoint. For example, a retry will not re-check the inventory if the system crashed after completing the “Process the payment” state. It will simply start at “Send the order to the warehouse.” This ensures that all previously successful steps are not duplicated, making the entire workflow idempotent and resilient to failures.
AWS Step Functions is a common tool for this pattern.
Different scenarios call for different approaches. Here’s a quick comparison of common strategies, how they work, and where they’re most effective:
Technique | How It Works | Best For |
Idempotency keys | Key + cached response on server | Payment APIs |
Unique constraints | DB rejects duplicate inserts based on unique keys | Orders, transactions |
Conditional writes | Write only if the condition is true | NoSQL operations |
State machines | Save progress and resume on retry | Long, multi-step workflows |
To ensure idempotency works well, keep these guidelines in mind:
Expire old keys: Avoid indefinite growth of stored keys.
Return consistent responses: A duplicate request should receive the same response every time.
Monitor retries: High retry rates might indicate a deeper system problem.
Focus on critical operations first: Payments, bookings, and inventory updates should be prioritized.
Let’s see how idempotency (or the lack of it) plays out in everyday systems:
Multiple form submissions:
Imagine you’re paying for an online course. You click “Pay,” but the page freezes. Thinking it didn’t work, you click again.
Result: The payment gateway processes both clicks as separate transactions. You’re charged twice.
Flight booking during a network glitch:
You book a flight, but the confirmation screen takes too long to appear due to a slow network. You refresh or retry.
Result: The airline system creates two reservations for the same passenger, potentially blocking another customer from booking that seat.
Refreshing checkout at an e-commerce store:
You place an order, but the browser refreshes while the system is still processing.
Result: Two separate orders are placed, confusing you and the seller.
Safe payment processing:
A payment API assigns a unique idempotency key to your first click.
Even if you retry five times, the API recognizes the key and processes the payment only once.
Consistent flight bookings:
The booking system stores your request with an idempotency key.
Any retries simply return the same confirmed reservation instead of creating duplicates.
Accurate inventory handling:
The system checks whether the same request has already been processed for warehouse restocks or order placements.
Duplicates are ignored, ensuring no double-counting or inventory mismatches.
Idempotency is essential for modern distributed systems. You should implement it for critical, user-facing operations like payments, orders, and bookings. By combining automatic retries with a robust idempotency mechanism, you can guarantee predictable outcomes for your users, even in the face of failures. Failures are inevitable in a distributed world, and idempotency is the seatbelt that prevents those failures from turning into catastrophic crashes.
Get hands-on experience implementing idempotency in the cloud—no setup or AWS account required. This browser-based AWS Cloud Lab will help you apply the concepts you’ve just learned.
Cloud Lab: Build Idempotent Order Processor with Lambda, DynamoDB, and SQS