Preparing for Termination of Nodes

In this lesson, we will set up a new ConfigMap, create a namespace, compare it with the previous one, and explore a CronJob that we will later use for the experiment.

What can we do?

Now, we know how to affect not only individual applications, but also random ones running in a Namespace, or even in the whole cluster. Next, we’ll explore how to randomize our experiments on the node level as well.

In the past, we were terminating or disrupting nodes where a specific application was running. Next, we will try to figure out how to destroy a completely random node. It will be without any particular criteria. We’ll just do random stuff and see how it affects our cluster. If we’re lucky, such actions will not result in any adverse result. Or, maybe they will. We’ll soon find out.

We couldn’t do this before because the steady-state hypothesis of our experiments was not enough, but we can do it now. If we destroy something (almost) completely random, any part of the system can be affected. We cannot use the Chaos Toolkit hypothesis to predict what the initial state should be, nor what the state after some destructive cluster-wide actions should be. We could do that, but it would be too complicated and we would be trying to solve the problem with the wrong tool.

Now, we know that we can use Prometheus to store metrics and that we can monitor our system through dashboards like Grafana and Kiali. We could, and should, go further. For example, we should create alerts that will notify us when any part of the system is misbehaving.

Now, we are ready to go full throttle and run our experiments on the cluster level.

Inspecting the ConfigMap defined in experiments-node.yaml

Let’s take a look at yet another YAML definition.

Get hands-on with 1200+ tech skills courses.