Introduction
Explore the complexities of writing code for distributed systems that run across multiple networked computers. Understand partial failures, fault tolerance, and the trade-offs between supercomputers and commodity hardware clusters. Gain insights into designing reliable software atop unreliable infrastructure.
We'll cover the following...
Introduction
Writing code on a single node is fairly straightforward but the moment we switch to writing code that runs on multiple computers connected by a network (distributed systems), the number of ways faults and failures can occur is numerous, nondeterministic and unpredictable. For example:
Misconfiguration of network switches
Accidental power cycles
Power distribution unit (PDU) failures
Backbone failures for the entire datacenter
Power failure for the entire datacenter
Distributed systems also suffer ...