Search⌘ K

Adopting Your Own Monkey

Explore chaos engineering to understand how deliberately introducing failures helps reveal hidden vulnerabilities in distributed systems. Learn to design safe experiments with limited blast radius, define system health clearly, and interpret results to improve system resilience under real-world conditions.

Vulnerabilities with chaos monkey

When Chaos Monkey launched, most developers were surprised by how many vulnerabilities it uncovered. Even services that had been in production for ages turned out to have subtle configuration problems. Some of them had cluster membership rosters that grew without bounds. Old IP addresses would stay on the list, even though the owner would never be seen again (or worse, if that IP came back it was as a different service)!

Prerequisites

First of all, chaos engineering efforts can’t kill companies or customers. In a sense, Netflix had it easy. Customers are familiar with pressing the play button again if it doesn’t work the first time. They’ll forgive just about anything except cutting off the end of Stranger Things. If every single request in the system is irreplaceably valuable, then chaos ...