Are You Ready for Chaos?

Carry out experiments in production

Before we proceed, I must give you a warning. You might not be ready for chaos engineering. If so, you might not benefit from this course and might not want to continue reading. Hopefully, you’re reading this lesson from the preview of the course, and you did not buy it yet since I am about to discourage you from continuing.

When are you ready for chaos engineering? Chaos engineering requires teams to be very mature and advanced. Also, if you’re going to practice chaos engineering, be prepared to do it in production. For example, we don’t want to see how a staging cluster behaves when unexpected things happen. “Real” chaos experiments are executed in production because we want to see how the real system used by real users reacts when bad things happen.

Sufficient budget requirements

Additionally, you, as a company, must be prepared to have a sufficient budget to invest in real reliability work. It does not come free. There is a cost for doing chaos engineering. You need to invest time in learning tools. You need to learn processes and practices. And, you need to pay for the damage that you do to your system.

Now, you might say that you can get the budget and that you can do it in production, but there’s more. There is an even bigger obstacle you might face.

You must have enough observability in your system. You need to have relatively advanced monitoring and alerting processes and tools so that you can detect the harmful effects of chaos experiments. If your monitoring setup is non-existent or unreliable, you will be doing damage to production without identifying the consequences. Without knowing what went wrong, you won’t be able to (easily) restore the system to the desired state.

Observability in the system

On the other hand, you might want to jump into chaos engineering as a way to justify investment in reliability work and in observability tools. If that’s the case, you might want to employ the practices from this course as a way to show your management that reliability investment is important and being capable of observing the system is a good thing.

So, you can approach this course from both directions. Either way, I’m warning you that this might not be the course for you. You might not be in the correct state or your organization might not be mature enough to be able to practice what we are about to show you.

In the next lesson, we will take a look at some use-cases for chaos engineering.

