Alerting on Traffic-related Issues

Explore how to measure the number of requests per second per replica in a Kubernetes cluster and convert these metrics into actionable alerts. Learn to combine metrics with label_join and configure Prometheus alerts to notify on traffic spikes or potential DoS attacks, improving cluster resilience.

We'll cover the following...

Measuring traffic
- Retrieve the number of requests
- - Retrieve requests per second per replica
- - Join the two metrics
  - Transform kube_deployment_status_replicas by adding ingress label
- Convert the expression into an alert

Measuring traffic

So far, we measured the latency of our applications, and we created alerts that fire when certain thresholds based on request duration are reached. Those alerts are not based on the number of requests coming in (traffic), but on the percentage of slow requests. The AppTooSlow would fire even if only one single request enters an application, as long as the duration is above the threshold. For completeness, we need to start measuring traffic or, to be more precise, the number of requests sent to each application and the system as a whole. Through that, we can know if our system is under a lot of stress and make a decision on whether to scale our applications, add more workers, or apply some other solution to mitigate the problem. We might even choose to block part of the incoming traffic if the number of requests reaches abnormal numbers providing a clear indication that we are under Denial of Service (DoS) attack.

We’ll start by creating a bit of traffic that we can use to visualize requests.

for i in {1..100}; do
    curl "http://$GD5_ADDR/demo/hello"
done

open "http://$PROM_ADDR/graph"

We sent a hundred requests to the go-demo-5 application and opened the Prometheus's graph screen.

Retrieve the number of requests

We can retrieve the number of requests coming into the Ingress controller through the nginx_ingress_controller_requests. Since it is a counter, we can continue using the rate function combined with sum ...

1.Before Getting Started

2.Autoscaling Deployments and StatefulSets

3.Auto-Scaling Nodes Of A Kubernetes Cluster

4.Collecting and Querying Metrics and Sending Alerts

5.Debugging Issues Discovered Through Metrics and Alerts

6.Extending HorizontalPodAutoscaler With Custom Metrics

7.Visualizing Metrics And Alerts

8.Collecting And Querying Logs

9.Conclusion

Alerting on Traffic-related Issues

Measuring traffic

Retrieve the number of requests