Learn about updating machines and downtime, handling load on machines, end-to-end health check, rolling out in immutable infrastructure, and in-memory sessions.

Updating machines

The time has come to roll the new code onto the machines. The exact mechanics of this are going to vary wildly depending on our environment and choice of configuration management tool. Let’s start by considering a convergence style infrastructure with long-lived machines that get changes applied to them.

Right away, we have to decide how many machines to update at a time. The goal is zero downtime, so enough machines have to be up and accepting requests to handle demand throughout the process. Obviously that means we can’t update all machines simultaneously. On the flip side, if we do one machine at a time, the rollout may take an unacceptably long time.

Instead, we typically look to update machines in batches. We may choose to divide your machines into equal-sized groups. Suppose we have five groups named Alpha, Bravo, Charlie, Delta, and Foxtrot. Rollout would go like this:

  1. Instruct Alpha to stop accepting new requests.

  2. Wait for the load to drain from Alpha.

  3. Run the configuration management tool to update code and config.

  4. Wait for green health checks on all machines in Alpha.

  5. Instruct Alpha to start accepting requests.

  6. Repeat the process for Bravo, Charlie, Delta, and Foxtrot.

Handling load

Our first group should be the canary group. Pause there to evaluate the build before moving on to the next group. Use traffic shaping at our load balancer to gradually ramp up traffic to the canary group while monitoring for anomalies in metrics. Is there a big spike in errors logged?

What about a marked increase in latency? Or RAM utilization? Better shut traffic off to that group and investigate before continuing the rollout.

To stop traffic from going to a machine, we could simply remove it from the load balancer pool. That’s pretty abrupt, though, and may needlessly disrupt active requests. It’s better to have a robust health check on the machine.

End-to-end health check

Every application and service should include an end-to-end health check route. The load balancer can check that route to see if the instance is accepting work. It’s also a useful thing for monitoring and debugging. A good health check page reports the application version, the runtime’s version, the host’s IP address, and the status of connection pools, caches, and circuit breakers.

With this kind of health check, a simple status change in the application can inform the load balancer not to send any new work to the machine. Existing requests will be allowed to complete. We can use the same flag when starting the service after pushing the code. Often considerable time elapses between when the service starts listening on a socket and when it’s really ready to do work. The service should start with the “available” flag set to false so the load balancer doesn’t send requests prematurely. In our example, when the Charlie group is being updated, Alpha and Bravo will be done but Delta and Foxtrot will be waiting. This is the time when all our careful preparation pays off. Both the old and new versions are running at the same time.

Rolling out in immutable infrastructure

Let’s now consider immutable infrastructure. To roll code out here, we don’t change the old machines. Instead we spin up new machines on the new version of the code. Our key decision is whether to spin them up in the existing cluster or to start a new cluster and switch over. If we start them up in the existing cluster, then we have the situation illustrated in the figure. As the new machines come up and get healthy, they will start taking load. This means that you need session stickiness, or else a single caller could bounce back and forth from the old version on different requests.

Get hands-on with 1200+ tech skills courses.