Scalability
Discover how to achieve system scalability to handle increasing workloads without performance degradation. Compare vertical and horizontal scaling strategies, and apply techniques such as load balancing, caching, and sharding. Learn to make informed System Design decisions that balance cost, complexity, and growth.
What is scalability?
Scalability is a system’s ability to handle increasing workload without degrading latency, throughput, or reliability. For example, a search engine must support more concurrent users and larger indexes while keeping query latency low. A scalable system can increase capacity to meet demand without a noticeable drop in responsiveness or availability.
Consider another example where early Twitter often crashed, displaying the “fail whale.” This was primarily a scalability issue. The system could not handle rapid user growth and write-heavy traffic. Scalability allows a system to absorb traffic spikes without excessive latency or downtime. Without it, response times increase, and outages become more likely.
System workloads vary by type:
Request workload: The number of requests served by the system.
Data workload: The amount of data stored, processed, or retrieved.
Dimensions of scalability
Scalability operates along three dimensions:
Size scalability: Adding users or resources without redesigning the system.
Administrative scalability: Managing a growing number of organizations or users sharing a distributed system.
Geographical scalability: Maintaining performance as the system expands ...