Auto-Scale Pods Based on Resource Utilization

Understand how to set up HorizontalPodAutoscaler (HPA) to automatically scale Kubernetes Deployments and StatefulSets based on CPU and memory resource usage. This lesson guides you through creating HPA definitions, troubleshooting resource request specifications, and observing how pods scale up and down in response to resource demands. By the end, you'll know how to optimize pod replicas dynamically to maintain performance and efficiency.

We'll cover the following...

Auto-scale based on resource usage
- Create HPA
  - Resource utilization not getting shown
- Create HPA with new definition
  - Resource utilization getting shown
- Actual memory usage above the target value
  - HPA continue to scale up the Deployment
Auto descale based on resource usage

Auto-scale based on resource usage

So far, the HPA has not yet performed auto-scaling based on resource usage. Let’s do that now. First, we’ll try to create another HorizontalPodAutoscaler but, this time, we’ll target the StatefulSet that runs our MongoDB. So, let’s take a look at yet another YAML definition.

Create `HPA`

cat scaling/go-demo-5-db-hpa.yml

The output is as follows.

apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
  name: db
  namespace: go-demo-5
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: StatefulSet
    name: db
  minReplicas: 3
  maxReplicas: 5
  metrics:
  - type: Resource
    resource:
      name: cpu
      targetAverageUtilization: 80
  - type: Resource
    resource:
      name: memory
      targetAverageUtilization: 80

That definition is almost the same as the one we used before. The only difference is that this time we’re targeting StatefulSet called db and that the minimum number of replicas should be 3.

Let’s apply it.

kubectl apply \
    -f scaling/go-demo-5-db-hpa.yml \
    --record

Let’s take another look at the HorizontalPodAutoscaler resources.

kubectl -n go-demo-5 get hpa

The output is as follows.

NAME REFERENCE      TARGETS                      MINPODS MAXPODS REPLICAS AGE
api  Deployment/api 41%/80%, 0%/80%              2       5       2

...

1.Before Getting Started

2.Autoscaling Deployments and StatefulSets

3.Auto-Scaling Nodes Of A Kubernetes Cluster

4.Collecting and Querying Metrics and Sending Alerts

5.Debugging Issues Discovered Through Metrics and Alerts

6.Extending HorizontalPodAutoscaler With Custom Metrics

7.Visualizing Metrics And Alerts

8.Collecting And Querying Logs

9.Conclusion

Auto-Scale Pods Based on Resource Utilization

Auto-scale based on resource usage

Create `HPA`

Auto-Scale Pods Based on Resource Utilization

Auto-scale based on resource usage

Create HPA

Create `HPA`