Final Design of Quora

Understand the limitations of Quora's design and improve on it.

Limitations of proposed design

The proposed design serves all the functional requirements. However, it has a number of serious drawbacks that will emerge as we scale. This means that we are unable to fulfill the non-functional requirements. Below we explore the main shortcomings:

  • Limitations of web and application servers: To entertain the user’s request, payloads will be transferred between web and application servers which will increase latency because of network I/O between these two types of servers. Even if we are achieving parallel computation by separating the web from application servers (i.e., the Master and Worker processes), the added latency due to an additional network link will erode a user’s experience. Apart from data transfer, control communication between the router library with Master and Worker processes will also impose additional performance penalties.
  • In-memory queue failure: The internal architecture of application servers log tasks and forward them to the in-memory queues, which serve them to the Workers. These in-memory queues of different priorities can be subject to failures. That is, if a queue gets lost, all the tasks in that queue will be lost and manual engineering will be required to recover those tasks. This will greatly reduce the performance of the system. On the other hand, replicating these queues requires beefing up the RAM size. Also, with the number of features (functional requirements) that our system offers, many tasks can get assembled, deeming memory insufficient. At the same time, it is not desirable to choke application servers with not-so-urgent tasks. For example, application servers should not be burdened with tasks like storing view counts for answers, adding statistics to the database for later analysis, etc.
  • Increasing QPS on MySQL: Because of a higher number of features offered by our system, few MySQL tables will receive a lot of user queries. This will result in a higher number of QPS on certain MySQL servers which can result in higher latency. Furthermore, there is no scheme defined for disaster recovery management.
  • Latency of HBase: Even though HBase allows high real-time throughput, its p99p99 stands for 99th percentile. It means that 99 percent of the queries will be entertained below a specific number. For example, p99 of 20ms means that 99 percent of the queries will be replied to within 20ms by a server. latency is not among the best. A number of Quora features require the ML engine that will have a latency of its own; with the addition of higher latency of HBase, the overall performance of the system will degrade over time.

The issues highlighted above require changes to the earlier proposed design. Therefore, we will make the following adjustments:

Create a free account to access the full course.

By signing up, you agree to Educative's Terms of Service and Privacy Policy