Failure Handling Techniques

Let’s look into some commonly used hardware failure handling techniques in distributed systems.

Failure is the norm in a distributed system, so building a system that can cope with failures is crucial.This chapter will cover principles on dealing with failures and basic patterns for building systems that are resilient to failures.

In distributed systems, dealing with a failure consists of three main parts: main parts:

  • identifying the failure
  • recovering from the failure
  • containing a failure to reduce its impact, in some cases

Get hands-on with 1200+ tech skills courses.