Final Design of Quora
Optimize the Quora System Design by addressing critical scaling limitations, such as network latency and database load. Implement vertical sharding for MySQL, replace high-latency HBase with MyRocks, and use Kafka for efficient asynchronous task processing. These architectural decisions ensure the platform meets high-performance and reliability standards.
Limitations of the proposed design
While the proposed design meets functional requirements, it has significant scalability drawbacks that violate non-functional requirements:
Limitations of web and application servers: To process a user request, payloads are exchanged between web and application servers, introducing additional network I/O latency. Although separating web and application tiers enables parallel processing through manager and worker processes, the extra network hop increases end-to-end response time. In addition to data transfer, coordinating traffic between the routing layer and the manager and worker processes adds further overhead.
In-memory queue failure: The application server logs tasks and enqueues them into in-memory queues, from which worker processes consume them. These priority-based in-memory queues represent a single point of failure. If a queue crashes or its process terminates unexpectedly, all queued tasks may be dropped and require manual reprocessing. This impacts system reliability and can degrade availability. Replicating in-memory queues increases memory overhead. As the number of supported features grows, the volume of background tasks increases, putting additional pressure on memory. Application servers should not be saturated with low-priority background tasks. For example, tasks such as persisting view counters or recording analytics events should be offloaded from the application tier.
Increasing QPS on MySQL: As the system supports more features, a small subset of MySQL tables receives a disproportionate share of user queries. This drives high query-per-second (QPS) load on specific MySQL instances, increasing contention and latency. In addition, the current design does not define a disaster recovery (DR) strategy.
Latency of HBase: Although HBase supports high write throughput, its tail latency (P99) can be relatively high compared to lower-latency datastores. Several Quora features depend on an ML inference pipeline, which introduces additional processing latency. When combined with HBase tail latency, this increases end-to-end response time and impacts user-perceived performance.
We will refine the design to address these issues: