Which Tool Should We Choose?

Different options for tools are discussed in this lesson and the choice for the appropriate one is made.

We established that the requirements for a tool we’ll pick are:

  • To be open-source
  • To work both inside Kubernetes and outside Kubernetes

What are the options? What are the tools we can choose from?

Manual executions or automation by scripting

To begin, we could do things manually. We could modify and tweak Kubernetes resources ourselves, we could do some changes to service mesh, and so on and so forth. However, manual execution of experiments is not what we want. As I mentioned before, I believe that the execution of chaos experiments should be automated. It should be executable periodically, or through continuous delivery pipelines. So, manual chaos is not an option.

Of course, we could automate things by writing our own scripts. But, why would we do that? There are tools that can help us and get us from nothing to something very fast. That does not exclude writing your own custom scripts. You’re almost certainly going to end up creating your scripts sooner or later. However, picking a tool that already does at least some of the things we need will get us to a certain level much faster.

Now that we know that we’re not going to do manual chaos engineering or write all the scripts from scratch ourselves, let’s see which tools we have at our disposal.

Chaos Monkey or Simian Army

There are Chaos Monkey, Simian Army, and other Netflix tools aimed at chaos engineering. What Netflix did with Chaos Monkey and the other tools is excellent. They were pioneers, at least among those that made their tools public. However, Chaos Monkey does not work well in Kubernetes. On top of that, it requires Spinnaker and MySQL. I don’t believe that everybody should adopt Spinnaker only for chaos engineering. Not working well in Kubernetes is a huge negative point and Spinnaker is another one. I’m not saying it’s a bad tool. Quite the contrary. Spinnaker is very useful for certain things. However, the question is whether you should adopt it only because you want to do some chaos engineering? Most likely not. If you are already using it, Chaos Monkey might be the right choice, but I cannot make that assumption.

Gremlin

Next, we have Gremlin. It might be one of the best tools we have on the market. While I encourage you to try it out and see whether it fits your needs, it’s a service (you cannot run it yourself), and it’s not open-source. Since open-source is one of the requirements, Gremlin is out as well.

PowerfulSeal

Further on, we have PowerfulSeal, which is immature and poorly documented. Besides that, it works only in Kubernetes and that fails one of our requirements.

kube-monkey

We also have kube-monkey, which is inspired by Chaos Monkey but is designed for Kubernetes. Just like PowerfulSeal, it is immature and poorly documented, and I do not recommend it.

LitmusChaos

Then, we have LitmusChaos, which suffers from similar problems. It is better documented than other tools I mentioned, but it is still not there. It is green, and it is Kubernetes only.

Gloo Shot

We also have Gloo Shot. I love the tools coming from solo.io. I like Gloo, and I like their service mesh. However, at least at the time of this writing (March 2020), Gloo Shot is relatively new, and it works only on the service mesh level. So, it’s also not a good choice.

Chaos Toolkit

Finally, we have Chaos Toolkit. It is very well documented, and you should have no problems finding all the information you need. It has quite a few modules that significantly extend its capabilities. We can use it with or without Kubernetes. We can run experiments against GCP, AWS, Azure, service mesh (Istio in particular), etc. It has a very active community. If you go to their Slack workspace, you will see that there are quite a few folks who will be happy to help you out. Even though the project is not fully there, I think it’s the best choice we have today (March 2020), at least among those I mentioned.

Our choice of tool

We are going to choose Chaos Toolkit because it’s open-source and works both inside and outside Kubernetes. It has decent documentation, and its community is always willing to help. That does not mean that other tools are bad. They’re not. However, we have to make a choice and that choice is Chaos Toolkit.

Remember, the goal of this course, and of almost everything I do, is to teach you how to think and the principles behind something, rather than how to use a specific tool. Tools are a means to an end. The goal should rarely be to master a tool but to understand the processes and the principles behind it.

By the end of this course, you will, hopefully, be a chaos engineering ninja, and you will know how to use Chaos Toolkit. If you do choose to use some other tool, you will be able to translate the knowledge of this course as the principles behind chaos engineering are the same no matter which tool you choose. You should be able to adapt to any tool easily. Nevertheless, I have a suspicion that you will like Chaos Toolkit and will find it very useful.


In the next lesson, we will define the requirements of this course.

Get hands-on with 1200+ tech skills courses.