Auto-Scale Pods Based on Resource Utilization
Understand how to set up HorizontalPodAutoscaler (HPA) to automatically scale Kubernetes Deployments and StatefulSets based on CPU and memory resource usage. This lesson guides you through creating HPA definitions, troubleshooting resource request specifications, and observing how pods scale up and down in response to resource demands. By the end, you'll know how to optimize pod replicas dynamically to maintain performance and efficiency.
Auto-scale based on resource usage
So far, the HPA has not yet performed auto-scaling based on resource usage.
Let’s do that now. First, we’ll try to create another HorizontalPodAutoscaler but, this time, we’ll target the StatefulSet that runs our MongoDB. So, let’s take a look at yet another YAML definition.
Create HPA
cat scaling/go-demo-5-db-hpa.yml
The output is as follows.
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: db
namespace: go-demo-5
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: StatefulSet
name: db
minReplicas: 3
maxReplicas: 5
metrics:
- type: Resource
resource:
name: cpu
targetAverageUtilization: 80
- type: Resource
resource:
name: memory
targetAverageUtilization: 80
That definition is almost the same as the one we used before. The only difference is that this time we’re targeting StatefulSet called db and that the minimum number of replicas should be 3.
Let’s apply it.
kubectl apply \
-f scaling/go-demo-5-db-hpa.yml \
--record
Let’s take another look at the HorizontalPodAutoscaler resources.
kubectl -n go-demo-5 get hpa
The output is as follows.
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
api Deployment/api 41%/80%, 0%/80% 2 5 2 ...