Logging at Pinterest [clone]

Learn how scalable and reliable logging is done at Pinterest.

Pinterest

Pinterest is a discovery engine. It saves the data about a user’s interest and recommends posts according to it. For example, if a person wants to know how they want to style for an event, Pinterest helps them see what items they should buy that resonate with their personality.

Requirements

The requirements for the Pinterest logging system are:

  • Availability— The infrastructure should be always up and running allowed.

  • Scalability— To maximize efficiency, we want the best utilization of our computation resources. The system should keep working even if the traffic goes up ten times.

  • Low latency — We want to add the logs generated due to an event within a threshold of a few seconds.

  • Minimum human intervention — The system should be up and running with minimal human help.

  • Robust system —The distributed system is not perfect and we want our data to stay intact.

High-level design

The high-level design of logging infrastructure, where it collects the logs and puts them into storage for later consumption; has primarily three main modules:

  1. App servers— where the services are running, and all data and events from the client apps are being collected.

  2. Kafka— a horizontally scalable and reliable system for message transportation.

  3. Storage (S3)— A blob storage to store the logs.

Pinterest uses more than 500 Kafka brokers to manage messages greater than 120 billion. The data is tens of terabytes per day.

Create a free account to access the full course.

By signing up, you agree to Educative's Terms of Service and Privacy Policy