Introduction
Explore the complexities of distributed systems by understanding common challenges such as network misconfigurations, partial failures, and power issues. Learn the design trade-offs between using supercomputers and commodity hardware clusters, and how systems are built to operate reliably despite hardware unreliability.
We'll cover the following...
Introduction
Writing code on a single node is fairly straightforward but the moment we switch to writing code that runs on multiple computers connected by a network (distributed systems), the number of ways faults and failures can occur is numerous, nondeterministic and unpredictable. For example:
Misconfiguration of network switches
Accidental power cycles
Power distribution unit (PDU) failures
Backbone failures for the entire datacenter
Power failure for the entire datacenter
Distributed systems also suffer ...