Designing a Monitoring System

Learn about the initial design of a generic monitoring system.

We'll cover the following

Requirements

Let’s sum up what we want our monitoring system to do for us:

  • Monitoring critical local processes on a server for crashes.

  • Monitoring any anomalies in the use of CPU/Memory/Disk/Network bandwidth by a process on a server.

  • Monitoring overall server health (CPU, Memory, Disk, Network bandwidth, Average load, etc.).

  • Monitoring hardware component faults on a server (like memory failures, failing or slowing disk, etc.).

  • Monitoring the server’s ability to reach out-of-server critical services (like Network file systems, etc.).

  • Monitoring all network switches, load-balancers, and any other specialized hardware inside a datacenter.

  • Monitoring power consumption at the server, rack, and datacenter level.

  • Monitoring any power events on the servers/racks/datacenter.

  • Monitoring routing information and DNS for external clients.

  • Monitoring network links and paths’ latency inside and across the datacenters.

  • Monitoring network status at the peering points.

  • Monitoring overall service health that might span multiple data centers (for example, CDN and its performance.)

We want automated monitoring that identifies an anomaly in the system and informs the alert manager or shows the progress on a dashboard. Cloud service providers provide a health status of their services:

Create a free account to access the full course.

By signing up, you agree to Educative's Terms of Service and Privacy Policy