Search⌘ K

Autoscaling Services

Explore how Knative enables autoscaling of serverless applications on Kubernetes. Learn to manage dynamic scaling of pods based on incoming HTTP traffic, including scaling down to zero after inactivity and scaling up during traffic peaks to optimize resource use.

How does autoscaling work?

One of the benefits of being serverless is the ability to scale up and down to meet demand. When there’s no traffic coming in, it should scale down, and when it peaks, it should scale up to meet demand. Knative scales out the pods for a Knative service based on inbound HTTP traffic. After a period of idleness (by default, 60 seconds), Knative terminates all of the pods for that service. In other words, it scales down to zero. Knative’s autoscaling capability is managed by Knative Horizontal Pod Autoscaler in conjunction with the Horizontal Pod Autoscaler built into Kubernetes.

If we haven’t accessed the hello service for more than one minute, the pods should already be terminated. ...