Designing a Monitoring System

Learn about the initial design of a generic monitoring system.

We'll cover the following

Requirements
High-level design

Requirements

Let’s sum up what we want our monitoring system to do for us:

Monitoring critical local processes on a server for crashes.
Monitoring any anomalies in the use of CPU/Memory/Disk/Network bandwidth by a process on a server.
Monitoring overall server health (CPU, Memory, Disk, Network bandwidth, Average load, etc.).
Monitoring hardware component faults on a server (like memory failures, failing or slowing disk, etc.).
Monitoring the server’s ability to reach out-of-server critical services (like Network file systems, etc.).
Monitoring all network switches, load-balancers, and any other specialized hardware inside a datacenter.
Monitoring power consumption at the server, rack, and datacenter level.
Monitoring any power events on the servers/racks/datacenter.
Monitoring routing information and DNS for external clients.
Monitoring network links and paths’ latency inside and across the datacenters.
Monitoring network status at the peering points.
Monitoring overall service health that might span multiple data centers (for example, CDN and its performance.)

We want automated monitoring that identifies an anomaly in the system and informs the alert manager or shows the progress on a dashboard. Cloud service providers provide a health status of their services:

AWS: https://health.aws.amazon.com/health/status
Azure: https://status.azure.com/en-us/status
Google: https://status.cloud.google.com/

Create a free account to access the full course.

By signing up, you agree to Educative's Terms of Service and Privacy Policy

Introduction

Abstractions

Non-functional System Characteristics

Back-of-the-Envelope Calculations

Building Blocks

Domain Name System (DNS)

Sequencer

Rate Limiter

Distributed Cache

Blob Store

Content Delivery Network (CDN)

Load Balancers

Key-Value Store

Distributed Messaging Queue

Pub-sub

Distributed Task Scheduler

Distributed Search

Distributed Logging

Distributed Monitoring

Monitoring Server Side Errors

Monitoring Client Side Errors

Databases

Sharded Counters

Concluding Building Blocks

Design YouTube

Design Quora

Design Google Maps

Designing a Proximity Server like Yelp

Design Uber

Design Twitter

Newsfeed System

Design Instagram

Design URL Shortening Service / TinyURL

Design a Web Crawler

Design WhatsApp

Design Typeahead Suggestion

Design Collaborative Document Editing Service / Google Docs

Spectacular Failures

Concluding Remarks

Appendix: System Design Interviews

All content below this will likely go away

Design Exercises

Archived temporary lessons

Design Resource Allocator for a Large Datacenter

Design Zoom

Continuous Monitoring using Data Processing

Design Live Commenting at Facebook

Security

For Noor: Placeholder for Illustration Making

Appendix

Backup of our Lessons

Caching Billions of Tiny Objects on Flash

Design Quora

Copy-Design YouTube

Identity & Access Management

Copy of CDN (02-03-2022)

Designing a Monitoring System

Requirements