Trusted answers to developer questions
Trusted Answers to Developer Questions

Related Tags

What is Transient Fault Handling?

Mohe Ud Din Sheikh

Grokking Modern System Design Interview for Engineers & Managers

Ace your System Design Interview and take your career to the next level. Learn to handle the design of applications like Netflix, Quora, Facebook, Uber, and many more in a 45-min interview. Learn the RESHADED framework for architecting web-scale applications by determining requirements, constraints, and assumptions before diving into a step-by-step design process.

Overview

Cloud computing is the fifth generation of computing and has brought an industrial revolution impossible to imagine a decade ago. However, with every advancement comes a new set of problems, and the biggest problem faced by cloud computing initially was transient failure.

Transient failures include the momentary loss of network access to components and services, the brief unavailability of a service, and timeouts that occur when a service is busy.

However, ever since these defects became widespread, experts have identified a pattern to deal with them.

Cloud computing

Transient Fault Pattern Handling

The Transient Fault Handling Pattern, also called the Retry Pattern, provides us with a tried and tested method to handle a transient fault: try until it works. While this might seem odd because it simply tells us to retry the operation and hope that the fault gets resolved, the method works.

Retrying the failing request

Explanation

In this example, the interaction between the user and the cloud service failed in the first and second attempts of the operation but is successful on the third try.

Where can we apply the Transient Fault Handling Pattern?

We’ve seen how such a simple solution has been effective in the industry. However, it isn’t applicable everywhere and we have to consider a couple of factors before retrying constantly.

Knowing which failures to retry

What if a relational database rejects a connection because of incorrect credentials? In this case, retrying won’t do us any good, so it’s important to identify the problem if possible before retrying continuously.

The period between retrying

An overwhelming retry strategy could possibly result in further throttling and blacklisting of a service user, or it could fully overwhelm and ruin a busy service, preventing it from recovering at all.

RELATED TAGS

CONTRIBUTOR

Mohe Ud Din Sheikh
Copyright ©2022 Educative, Inc. All rights reserved

Grokking Modern System Design Interview for Engineers & Managers

Ace your System Design Interview and take your career to the next level. Learn to handle the design of applications like Netflix, Quora, Facebook, Uber, and many more in a 45-min interview. Learn the RESHADED framework for architecting web-scale applications by determining requirements, constraints, and assumptions before diving into a step-by-step design process.

Keep Exploring