Retries

Learn how stateless and stateful systems recover from failures.

The stateless and stateful systems use the technique of retries to recover from the failures.

Using retries in stateless systems

In the case of a stateless system, the application of retries is pretty simple since all the application nodes are identical from the client’s perspective, so it could retry a request on any node.

Note: In some cases, retries are done in a fully transparent way to the client.

For example, suppose the application is fronted by a load balancer that receives all the requests under a single domain. In that case it’s responsible for forwarding the requests to the various nodes of the application. In this way, the client would only have to retry the request to the same endpoint, and the load balancer would take care of balancing the requests across all the available nodes.

Using retries in stateful systems

In stateful systems, retries get slightly more complicated since nodes are not identical, and retries need to be directed to the right one.

For example, when using a system with leader-follower replication, a failure of the leader node must be followed by failover to a follower node that is now the new leader , and new requests should be going there. There are different mechanisms to achieve this, depending on the technology used. The same applies to consensus-based replication, where a new leader election might need to happen, and write operations must be directed to the current leader.

Get hands-on with 1200+ tech skills courses.