Distributed Systems: Building Software for the Real World/

...

Governor

Learn about human and automation limitations for controlling errors, governors, uses of governors, and a quick wrap up.

We'll cover the following...

Human vs. Automation
Governor

Role of governors

Tips to remember

Slow things down to allow intervention
Apply resistance in the unsafe direction
Consider a response curve

Wrapping up

Human vs. Automation

In the Force Multiplier lesson, we looked into an outage that Reddit suffered. As a quick reminder, Reddit’s configuration management system restarted a part of its infrastructure management that scales server instances up and down. This was in the middle of a ZooKeeper migration, so the autoscaler read a partial configuration and decided to shut down nearly every machine instance in Reddit.

The flip side of that coin is a job scheduler that spins up too many computational instances in order to process a queue before a deadline. The work still can’t get done fast enough, and, to add insult to injury, the cloud provider’s invoice that month is written in scientific notation. Automation has no judgment. When it goes wrong, it tends to go wrong really quickly. By the time a human perceives the problem, it’s a question of recovery rather than intervention. How can we allow human intervention without putting a human in the loop for everything? We should use automation for things humans are bad at: repetitive ...

Living in Production

The Exception That Grounded an Airline

Stabilize Your System

Stability Antipatterns

Failures And Blockages

Force Multiplier

Stability Patterns

Launching An Online Store

Foundations

Processes on Machines

Interconnect

Control Plane

Security

Design for Deployment

Handling Versions

Case Study: Trampled by Your Own Customers

Adaptation

System Architecture

Information Architecture

Chaos Engineering

Bibliography

Governor

Human vs. Automation