Search⌘ K

Production Environment Problems

Discover the core concepts of chaos engineering and how it enhances resilience by experimentally testing distributed systems in production. Learn why full-system testing is essential to reveal issues like timeouts and cascading failures that do not appear in isolated or staging environments.

What is chaos engineering?

Imagine a conversation that starts like this:

“Hey boss, I’m going to log into production and shut down a few instances of the system. Just a few here and there. Shouldn’t hurt anything,” you say.

How do you think the rest of that conversation will go? Normally, not well! Killing instances turns out to be a radical idea, but not a crazy one. It’s one technique in an emerging discipline called chaos engineering. 1^{1} ...