Distributed Systems: Building Software for the Real World/

...

Adopting Your Own Monkey

Learn about vulnerabilities in chaos monkey and its prerequisites, limiting chaos tests, and defining a healthy system state.

We'll cover the following...

Vulnerabilities with chaos monkey
Prerequisites
Limiting chaos test exposure
Defining healthy systems
Designing the experiment

Vulnerabilities with chaos monkey

When Chaos Monkey launched, most developers were surprised by how many vulnerabilities it uncovered. Even services that had been in production for ages turned out to have subtle configuration problems. Some of them had cluster membership rosters that grew without bounds. Old IP addresses would stay on the list, even though the owner would never be seen again (or worse, if that IP came back it was as a different service)!

Prerequisites

First of all, chaos engineering efforts can’t kill companies or customers. In a sense, Netflix had it easy. Customers are familiar with pressing the play button again if it doesn’t work the first time. They’ll forgive just about anything except cutting off the end of Stranger Things. If every single request in the system is irreplaceably valuable, then chaos ...

Living in Production

The Exception That Grounded an Airline

Stabilize Your System

Stability Antipatterns

Failures And Blockages

Force Multiplier

Stability Patterns

Launching An Online Store

Foundations

Processes on Machines

Interconnect

Control Plane

Security

Design for Deployment

Handling Versions

Case Study: Trampled by Your Own Customers

Adaptation

System Architecture

Information Architecture

Chaos Engineering

Bibliography

Adopting Your Own Monkey

Vulnerabilities with chaos monkey

Prerequisites