Alerting on Error-related Issues

Understand how to monitor and alert on error rates for Kubernetes applications by querying Prometheus metrics, calculating error percentages, and configuring alert rules to avoid unnecessary notifications.

We'll cover the following...

- Monitor the rate of errors compared to the total number of requests
  - Retrieve and separate requests from their statuses
- - Using /demo/random-error endpoint to generate random error responses
  - Write an expression to retrieve error rate

Monitor the rate of errors compared to the total number of requests

We should always be aware of whether our applications or the system is producing errors. However, we cannot start panicking at the first occurrence of an error since that would generate too many notifications that we’d likely end up ignoring. Errors happen often, and many are caused by issues that are fixed automatically or are due to circumstances that are out of our control. If we are to perform an action on every error, we’d need an army of people working 24/7 only on fixing issues that often do not need to be fixed. As an example, entering into a “panic” mode because there is a single response with code in 500 range would almost certainly produce a permanent crisis. Instead, we should monitor the rate of errors compared to the ...

1.Before Getting Started

2.Autoscaling Deployments and StatefulSets

3.Auto-Scaling Nodes Of A Kubernetes Cluster

4.Collecting and Querying Metrics and Sending Alerts

5.Debugging Issues Discovered Through Metrics and Alerts

6.Extending HorizontalPodAutoscaler With Custom Metrics

7.Visualizing Metrics And Alerts

8.Collecting And Querying Logs

9.Conclusion

Alerting on Error-related Issues

Monitor the rate of errors compared to the total number of requests