...

/

Node failures in distributed systems

Node failures in distributed systems

When a Node fails, it is removed from the Network. Today, services are expected to be available even when Nodes fail. It is called Fault Tolerance - a topic that we will cover in much detail.

Following are some of the common reason for Node failures:

1. Hardware failures

Hardware failures are a common occurrence in distributed systems. Common examples are:

  • Disk failures, e.g., head crashes, bad sectors, etc.
  • RAM
...
Access this course and 1400+ top-rated courses and projects.