Node failures in distributed systems
When a Node fails, it is removed from the Network. Today, services are expected to be available even when Nodes fail. It is called Fault Tolerance - a topic that we will cover in much detail.
Following are some of the common reason for Node failures:
1. Hardware failures
Hardware failures are a common occurrence in distributed systems. Common examples are:
- Disk failures, e.g., head crashes, bad sectors, etc.
- RAM
Access this course and 1400+ top-rated courses and projects.