Alerting on Latency-related Issues

Explore how to measure and alert on latency-related issues in Kubernetes using Prometheus. Learn to deploy a demo app, query request durations, calculate request rates, and create alerts based on latency thresholds. Understand filtering metrics with regex to target alerts effectively and confirm alert notifications through Slack integration.

We'll cover the following...

Measure latency
- Filtering metrics
- Define an alert
- - Generate a few slow requests
- - Confirm Slack notification
- Using regular expressions to exclude applications from alert
- - Specifying ingress!~"prometheus.+|jenkins.+ as filter

Measure latency

We’ll use the go-demo-5 application to measure latency, so our first step is to install it.

GD5_ADDR=go-demo-5.$LB_IP.nip.io

kubectl create namespace go-demo-5

helm install go-demo-5 \
    https://github.com/vfarcic/go-demo-5/releases/download/0.0.1/go-demo-5-0.0.1.tgz \
    --namespace go-demo-5 \
    --set ingress.host=$GD5_ADDR

We generated an address that we’ll use as an Ingress entry-point, and we deployed the application using Helm. Now we should wait until it rolls out.

kubectl -n go-demo-5 \
    rollout status \
    deployment go-demo-5

Before we proceed, we’ll check whether the application is indeed working correctly by sending an HTTP request.

curl "http://$GD5_ADDR/demo/hello"

The output should be the familiar hello, world! message.

Get the duration of requests entering the system

Now, let’s see whether we can, for example, get the duration of requests entering the system through Ingress.

open "http://$PROM_ADDR/graph"

If you click on the - insert metrics at cursor - drop-down list, you’ll be able to browse through all the available metrics. The one we’re looking for is nginx_ingress_controller_request_duration_seconds_bucket. As its name implies, the metric comes from NGINX Ingress Controller, and provides request durations in seconds and grouped in buckets.

Please type the expression that follows and click the Execute button.

nginx_ingress_controller_request_duration_seconds_bucket

In this case, seeing the raw values might not be very useful, so please click the Graph tab.

You should see graphs, one for each Ingress. Each is increasing because the metric in question is a counter. Its value is growing with each request.

🔍 A Prometheus counter is a cumulative metric whose value can only increase, or be reset to zero on restart.

Calculate the rate of requests

What we need is to calculate the rate of requests over a period of time. We’ll ...

1.Before Getting Started

2.Autoscaling Deployments and StatefulSets

3.Auto-Scaling Nodes Of A Kubernetes Cluster

4.Collecting and Querying Metrics and Sending Alerts

5.Debugging Issues Discovered Through Metrics and Alerts

6.Extending HorizontalPodAutoscaler With Custom Metrics

7.Visualizing Metrics And Alerts

8.Collecting And Querying Logs

9.Conclusion

Alerting on Latency-related Issues

Measure latency

Get the duration of requests entering the system

Calculate the rate of requests