Designing a Monitoring System

A service has no visibility to the errors that do not occur at its infrastructure. Still, such failures are equally frustrating for the customers, and they might have to ask their friends, “is the service x down for you as well?” or head to sites like down detector to see if anyone else is reporting the same issues. They might report the problem via a Tweet or some other communication channel. However, all such cases have a slow feedback loop. As a service provider, we want to detect such problems as quickly as possible to take remedial measures. Let’s dive into designing such a system.

Initial design

To ensure that the client’s requests reach the server, we will act as clients and perform reachability and health checks. We will need various vantage points across the globe. We can run a service, let’s call it prober, that periodically sends requests to the service to check availability. This way we can monitor reachability to our service from many different places.

Create a free account to access the full course.

By signing up, you agree to Educative's Terms of Service and Privacy Policy