Mechanisms for Autoscaling
Learn what parameters can be used to autoscale an application.
We'll cover the following...
There are four primary indicators to watch during the operation of an application that can indicate potential issues with maintaining uptime or responsiveness SLAs. These include CPU load, I/O load (often seen as disk pressure in Kubernetes), request and network load (often seen as network pressure in Kubernetes), and memory load. Understanding these indicators is essential in helping to prepare us to adjust configuration settings, therefore leading to scalable supporting infrastructure. Understanding how our application components affect each of these indicators is important as well.
Compute and CPU load
The amount of CPU that’s utilized will vary heavily across physical machines, virtual machines, and even hosted cloud services. With cloud services, even greater amounts of fine-grained adjustments can be made. For example, with Azure App Service, the amount of compute initially available to App Service is tied directly to the App Service plan that hosts the service. Thresholds can be set to look for sustained CPU activity that lasts longer than 5–2 minutes, anomalies in CPU usage (spikes), or even a point-in-time benchmark, such as a set percentage.
Similar rules ...