Designing a system to meet its nonfunctional requirements (NFRs) presents a significant challenge. This involves managing critical trade-offs between competing NFRs such as scalability, availability, performance, and security.

While the engineer will be able to design the system to meet all functional requirements, making the design scalable and still achieving low latency on requests will remain a challenge.

Today, through this lesson, we’ll discuss a few essential strategies for meeting NFRs in our designs. These strategies will prepare us to confidently navigate System Design interviews at top tech companies.

Let’s get started!

Disclaimer: In the previous lessons, we defined certain NFRs. Please note that identifying and achieving an NFR are two different things. In this lesson, our focus will be on achieving non-functional requirements.

Common non-functional requirements

Let’s discuss common non-functional requirements that interviewers focus on and learn how to meet them effectively when tackling System Design interview questions. The common non-functional requirements that we will address in this lesson are:

Performance
Availability
Scalability

1) Performance

Performance is a known NFR that determines the system’s ability to respond to user requests and process data efficiently.

For example, when designing a messaging service, the interviewer might ask questions like: How to deliver messages with low latencyLow latency refers to minimal time delays? To achieve low latency, candidates must select an efficient two-way communication protocol. For this, they can opt for WebSocket. This is a single example of achieving performance. We will see more examples in the upcoming approaches section.

Let’s examine various approaches to achieving optimal performance.

Approaches to achieve performance

Caching: Implementing an effective cache mechanism is one of the methods to achieve optimal performance. It stores frequently accessed data, reducing the need for repeated computations and minimizing user-perceived latency.

Let’s assume an X (formerly Twitter)-like system where a service is dedicated to generating the timelineThe timeline includes a stream of posts and various recommendations based on the user's interests and followers' activity (such as their posts, reposts, likes, etc.)..

Let’s call it the timeline service. Now, the question that interviewers commonly ask related to the timeline is: Does the timeline service generate a timeline of each celebrity’s followers when a celebrity posts something? As celebrities have millions of followers, it impacts the system's performance to create timelines for all of them.

To address this question, we need to analyze the followers first.

Not every follower uses X all the time. As a first step, we can divide those followers into active usersActive users are those who frequently use their X accounts. and inactive usersInactive users are those who used their accounts a long time ago (say, more than three months).. For inactive users, the timeline service will not generate the timeline instantly. For active users, we will introduce a cache.

Let’s call it the feed cacheIt is a distributed cache like Redis for storing the timeline of active users..

This cache prepopulates the timeline for active users. When active users request a timeline, the timeline service immediately retrieves the timeline from the feed cache and appends the celebrity post, returning it to the client with minimal latency.

Additionally, a cache mechanism is usually implemented in each system layer to ensure decoupling and low latency.

Algorithm/Data Structure selection: Choosing efficient algorithms and data structures is another approach to enhancing the system's performance.

An efficient algorithm or data structure minimizes processing time and improves overall system performance. For example, a candidate may be asked which data structure would be suitable for efficiently and frequently (every four seconds) storing a driver’s position in the ride-hailing system.

One possible solution is the Quadtree data structure, which is a suitable option for ensuring optimal performance in this case.

It receives data from the driver every 4 seconds, and the driver relocates within the Quadtree based on the new location. However, updating a Quadtree at such a high frequency introduces computational overhead, which can increase latency.

Candidates should consider whether a Quadtree is optimal in this scenario and may explore combining different approaches to achieve better performance and scalability.

Load balancing: Distributing incoming traffic evenly among different servers (load balancing) is another strategy to achieve high performance. For example, with millions of users on an e-commerce website, multiple requests can arrive in a matter of seconds. Load balancing is useful in such situations to distribute the load on a server, allowing it to handle only the load it can manage. Load balancers distribute user requests across multiple servers to prevent bottlenecks and ensure optimal server performance.

2) Availability

The system's availability is another non-functional requirement that describes how effectively it maintains users' accessibility and uptime. Generally, a system with 99.999% uptime is considered good availability. This percentage of availability is equivalent to less than 6 minutes of downtime per year. Achieving this amount of availability is very challenging. However, providing 99.999% availability helps retain a large number of users.

For example, when designing an online shopping website, availability is crucial since customers often use the site to browse products, make purchases, and track their orders. Any downtime might lead to lower sales and disappointed customers.

Let’s discuss some general approaches to achieve availability.

Approaches to achieve availability

Redundancy: One way to meet availability is by replicating key components and data across numerous servers and data centers.

By doing this, we ensure that if one server fails or traffic is high, the load balancer can automatically reroute requests to an alternate backup server. Additionally, implementing redundant components across multiple layers (servers, databases, and networks) can prevent a single point of failure.

Fault tolerance: During a discount sale on a shopping website, one of the key database nodes in a specific region suffers a hardware breakdown. This node handles a considerable amount of the user's activities in this region. In such scenarios, our system must be fault-tolerant, meaning it will continue to function even if one or more components fail.

We can achieve this tolerance by using redundant components and failover methods that automatically switch traffic from the failed component to the backup component.

Rate limiting: Another approach to achieve availability is rate limiting.

The rate limiter restricts the number of requests that a service can handle. Setting rate limits to control the number of requests a user can make, thereby preventing system overload. For example, on a social media platform, a system overload would occur when users like posts, play videos, and follow others at a higher rate than usual.

Without rate limiting, this sudden increase in activity can overwhelm the system, leading to system failure.

Stress testing and monitoring: Another way to ensure availability is stress testing.

It is performed to determine how the system behaves under peak load conditions, allowing us to identify breaking points and ensure the system can handle sudden traffic spikes. This will prepare the system for availability after testing is complete.

Additionally, implementing monitoring can enable us to track system performance and detect anomalies in real-time.

3) Scalability

System scalability refers to a system's ability to expand and handle an increasing number of users while maintaining optimal performance.

For example, an interviewer might ask questions about designing a service like YouTube that can accommodate millions of users uploading and watching videos simultaneously, or designing a URL shortening service capable of handling billions of queries every day.

To address these questions about scalability in interviews, let's look at different approaches.

Approaches to achieve scalability

Manual scaling is one approach to scaling applications. It involves either upgrading hardware on existing machines (vertical expansion) or adding more machines (horizontal expansion):

Vertical scaling (hardware upgrades): Add more resources (RAM, CPU, storage) to existing machines for smaller demands. It’s easier to manage since we aren’t adding to the total number of machines.
Horizontal scaling (adding machines): Increase the number of machines to distribute the workload for larger demands. In general, horizontal scaling is considered the preferred option for large-scale applications because it does not have a single point of failure and supports load balancing, unlike vertical scaling.

Automatic scaling: Dynamically adjust resources (storage, processing power) based on demand to handle traffic spikes. This can be achieved using a cloud computing technique called Auto Scaling.

Sharding: Another approach to achieve scalability by dividing the database into shards to distribute the load across multiple servers. In this way, we distribute the data load between multiple servers. Key-rangeIt distributes data based on specific ranges of keys. and hash-basedThis distributes the data by applying a hash function to the keys, ensuring even distribution across shards. sharding are common techniques to perform sharding for databases.

Modular design: Break down the system into smaller, independent components, allowing each service to scale independently according to demand without affecting the performance of other services.

Caches and CDNs: Caches store frequently accessed data in memory to reduce response times and database loads.

CDNs, on the other hand, are used to distribute static content to users rather than retrieving it from the origin, thereby further reducing the server's load. By utilizing caching and a CDN, the system can efficiently manage a high volume of user requests without experiencing slower performance.

Now that we’ve understood different approaches to achieving NFRs, including performance, availability, and scalability, let's practice them by taking a deeper look at the system design of Google Maps and YouTube.

Acing NFRs: Google Maps and YouTube

Let’s explore non-functional requirements for Google Maps and YouTube System Design problems.

Design Google Maps

Designing a navigation system like Google Maps involves enabling users to identify their current location, find the optimal route based on specified destinations, and provide detailed turn-by-turn directions for seamless navigation.

Considering the following non-functional requirements of Google Maps, let’s describe the strategies to achieve them:

High availability: The design of Google Maps includes a large road network graph. If we hosted this graph on a server, it would definitely crash due to its large size and high user demands. To ensure availability, we divide the graph into smaller segments and host them on separate servers. By replicating these servers, we eliminate single points of failure and utilize a load balancer to distribute incoming user requests across multiple segment servers.
Scalability: To scale our Google Maps system, we utilize a distributed architecture where each segment is hosted on a separate server, allowing us to serve user requests for different routes from different segment servers. Thus, we can serve millions of user requests. As we are using a modular design, we can easily add more segments to handle additional data.

Let’s summarize the strategies we used to achieve Google Maps NFRS:

Considering the following non-functional requirements of YouTube, let’s describe the strategies to achieve them:

Minimal response times: To ensure the optimal performance of YouTube’s design, we utilize multiple caching servers at both the ISP and CDN levels to deliver the most viewed content with the fastest response times. At the same time, choosing an appropriate storage system for different types of data, such as using Bigtable to store thumbnails and Blob storage to store videos, can reduce latency. We prefer to use a Lighttpd-based web server for users to upload their videos, as it processes such content more efficiently and provides a smoother user experience.
Reliability: To ensure the system behaves correctly over time, we focus on data durability and operation correctness. We replicate critical data and use mechanisms to maintain consistency across replicas, ensuring that updates are not lost and operations are complete as expected. We also implement integrity checks and fault-tolerant processes so that even if some components fail, the system continues to deliver accurate, predictable results.

Quick tips for NFR interview questions

Proactively ask questions to clarify non-functional requirements during the interview. For example:
- Expected user traffic
- Expected data load
- Expected downtime tolerance
Evaluate the trade-offs between different techniques, considering factors such as system complexity, cost, and maintainability.
Prepare a list of commonly asked questions with their solutions. For example:
- For reliable transaction processing—choose ACID-compliant relational databases.
- For large-scale data applications, consider using NoSQL databases, such as MongoDB or Cassandra, to achieve scalability.
- Real-time data processing and analytics—choose platforms like Apache Kafka, Amazon Kinesis, etc.

Remember, there is no one-size-fits-all solution. Every design decision involves trade-offs. As a designer of scalable systems, our ability to weigh these trade-offs is critical. It's crucial to ask the interviewer clarifying questions, consider the NFRs carefully, and make informed choices to create a robust System Design.

Conclusion

In this lesson, we have attempted to demonstrate the importance of non-functional requirements and how to address them in System Design interviews. By understanding common NFRs and practical strategies for solving them, we will be better prepared to address NFR-related questions during the interview.

Nonfunctional requirements	Strategies
Availability	Divide the road network graph into small graphs (segments) to process user queries. Replicate the small segment servers. Request load balancing across different segment servers.
Scalability	Partition the large graphs into smaller graphs to ease segment addition. Host the graphs on different servers to handle increased number of queries per second.

Nonfunctional Requirements	Strategies
Less response time	Cache at different layers CDNs Choose appropriate storage systems (e.g., blob storage to store videos, Bigtable to store thumbnails) Serve videos and static content with Lighttpd
Reliability	Replication of critical data Consistency mechanisms across replicas Integrity checks Fault-tolerant processes

Non-Functional Requirements for System Design Interviews