Grokking the Modern System Design Interview/

...

System Design: Distributed Logging

Let's understand the basics of designing a distributed logging system.

We'll cover the following...

Logging
Need for logging
How will we design a distributed logging system?

Logging is essential in understanding the flow of an event in a distributed system. It seems like a tedious task, but upon facing a failure or a security breach, logging helps pinpoint when and how the system failed or was compromised. It can also aid in finding out the root cause of the failure or breach. It decreases the meantime to repairMean time to repair (MTTR) is a basic measure of the maintainability of repairable items. It represents the average time required to repair a failed component or device. (Source: Wikipedia) a system.

Why don’t we simply print out our statements to understand the application flow? It’s possible but not ideal. Simple print statements have no way of tracking the severity of the message. The output of print functions usually goes to the terminal, while our need could be to persist such data on a local or remote store. Moreover, we can have millions of print statements, so it’s better to structure and store them properly.

Concurrent activity by a service running on many nodes might need causality information to stitch together a correct flow of events properly. We must be careful while dealing with causality in a distributed system. We use a logging service to appropriately manage the diagnostic and exploratory data of our distributed software.

Logging allows us to understand our code, locate unforeseen errors, fix the identified errors, and visualize the application’s performance. This way, we are aware of how production works, and we know how processes are running in the system.

Log analysis helps us with the following scenarios:

To troubleshoot applications, nodes, or network issues.
To adhere to internal security policies, external regulations, and compliance.
To recognize and respond to data breaches and other security problems.
To comprehend users’ actions for input to a recommender system.

Distributed Cache System

Pub-Sub

Blob Store

TikTok

Uber Eats

NewsFeed

Facebook Messenger

ChatGPT

System Design: Distributed Logging

Logging

Need for logging

How will we design a distributed logging system?