Exploring High Availability and Fault Tolerance of a Cluster
Explore the high-availability and fault tolerance of our cluster.
We'll cover the following...
We'll cover the following...
The cluster would not be reliable if it’s not fault-tolerant. kOps intends to do that, but we’re going to validate that anyway.
Terminating a worker node
Let’s retrieve the list of worker node instances.
aws ec2 \describe-instances | jq -r \".Reservations[].Instances[] \| select(.SecurityGroups[]\.GroupName==\"nodes.$NAME\")\.InstanceId"
We use aws ec2 describe-instances to retrieve all the instances (five in total). The output is sent to jq, which filters them by the security group dedicated to worker nodes.
The output is as follows:
i-063fabc7ad5935db5i-04d32c91cfc084369
We’ll terminate one of the worker nodes. To do that, we’ll pick a random one and retrieve its ID. ...