System Design Trade-Offs
Explore how to reason through critical system design trade-offs such as consistency, availability, latency, durability, scalability, and complexity. Understand how these choices affect distributed systems and how to align technical decisions with user and business requirements for effective architecture.
Designing a distributed system is never about finding one perfect solution; it’s about striking a balance between competing goals. In this lesson, we’ll learn how to reason through the key trade-offs that shape system behavior and performance, and how to align technical decisions with user and business requirements.
Why trade-offs matter
No system can be infinitely fast, completely reliable, and perfectly consistent all at once.
Engineering is a discipline of compromise; each improvement in one area often comes at a cost in another. A good system design, therefore, is one that finds the right balance among consistency, availability, latency, and other system qualities to achieve product goals.
The art of trade-offs lies in making these decisions deliberately, not accidentally. Let’s begin by examining the most fundamental tensions that define distributed systems.
The three core trade-offs
At the heart of distributed systems are three interdependent properties: consistency, availability, and latency. Improving one often reduces another, and understanding how they interact helps us reason about real-world architectures. The following explains what each of these terms means in practice:
Consistency: It ensures that all clients view the same data simultaneously, regardless of which node they connect to. In a strongly consistent system, a read operation is guaranteed to return the most recently written value.
Latency: It represents the time it takes for a request to travel from the client to the server and back. A system with low latency delivers fast, responsive interactions, which are essential for a good user experience.
Availability: It reflects the system’s ability to remain operational and handle requests, even if one or more nodes fail. It is often measured as a percentage of uptime, like the famous “five nines” (99.999% availability).
A banking ...