Injecting Chaos
Explore how chaos engineering enhances system resilience by deliberately injecting failures such as killing instances, adding latency, and causing service call failures. Learn practical methods to reveal hidden vulnerabilities in distributed systems before they cause major outages, and understand the importance of careful and controlled chaos deployment.
We'll cover the following...
Killing instances
The next step is to apply knowledge of the system to inject chaos. We know the structure of the system well enough to guess where we can kill an instance, add some latency, or make a service call fail. These are all injections. Chaos Monkey does one kind of injection: it kills instances. Killing instances is the most basic and crude kind of injection. It will absolutely find weaknesses in the system, but it’s not the end of the story.
Latency monkey
Latency Monkey adds latency to calls. This strategy finds two additional kinds of weaknesses. First, some services just time out and report errors when they should have a useful fallback. Second, some services have undetected race conditions that only become apparent when responses arrive in a different order than usual.
Failure injection testing
When we have deep trees of service calls, our ...