Search⌘ K
AI Features

Non-Functional Requirements for System Design Interviews

Learn why non-functional requirements matter in System Design interviews, and discover a few often-overlooked best practices for designing systems that scale well.

Designing systems to meet non-functional requirements (NFRs) is challenging. You must manage trade-offs among competing goals such as scalability, availability, performance, and security.

Non-functional requirements
Non-functional requirements

Consider this common System Design interview question:

  • How can you design a scalable and performant e-commerce website that can handle millions of requests per second?

Engineers often meet functional requirements easily but struggle to achieve scalability and low latency simultaneously. This lesson covers essential strategies for meeting NFRs in your designs.

Note: This lesson focuses on achieving them.

Common non-functional requirements

Interviewers focus on specific NFRs. We will address:

  1. Performance

  2. Availability

  3. Scalability

1) Performance

Performance measures a system’s ability to respond to requests and process data efficiently. For example, in a messaging service, an interviewer might ask: How do you deliver messages with low latencyLow latency refers to minimal time delays? To achieve this, you might select an efficient two-way protocol like WebSocket.

Approaches to achieve performance

Caching: Caching stores frequently accessed data, reducing repeated computations and user-perceived latency.

Web service uses service cache to access frequently accessed data to ensure low latency
Web service uses service cache to access frequently accessed data to ensure low latency

Consider an X (formerly Twitter)-like system with a service dedicated to generating the timelineThe timeline includes a stream of posts and various recommendations based on the user's interests and followers' activity (such as their posts, reposts, likes, etc.)..

Does the service generate a timeline for every follower when a celebrity posts? With millions of followers, this would severely degrade performance.

To address this, first divide followers into active usersActive users are those who frequently use their X accounts. and inactive usersInactive users are those who used their accounts a long time ago (say, more than three months).. Generate timelines for inactive users on demand. For active users, introduce a feed cacheIt is a distributed cache like Redis for storing the timeline of active users.. This cache prepopulates the timeline. When active users request their feed, the service retrieves it immediately from the cache, ensuring minimal latency.

Fetching a timeline for active users from the feed cache
Fetching a timeline for active users from the feed cache

Caching at multiple system layers also ensures decoupling and low latency.

Algorithm/Data structure selection: Efficient algorithms minimize processing time. For example, consider a ride-hailing system that updates driver positions every four seconds. You must choose a data structure that handles frequent spatial updates efficiently.

A Quadtree is a strong candidate for spatial indexing. However, updating a Quadtree every 4 seconds introduces computational overhead that increases latency. You must evaluate if a Quadtree is optimal here or if a hybrid approach better balances performance and scalability.

Load balancing: Distributing traffic evenly across servers prevents bottlenecks. For an e-commerce site handling millions of concurrent requests, load balancers ensure no single server is overwhelmed.

Distributing user requests across multiple web servers
Distributing user requests across multiple web servers

2) Availability

Availability measures system uptime and accessibility. 99.999% uptime (less than 6 minutes of downtime per year) is the gold standard but difficult to achieve. High availability is critical for retention; for example, downtime on an e-commerce site directly loses sales.

Approaches to achieve availability

Redundancy: Replicate key components and data across multiple servers and data centers. If one server fails, a load balancer reroutes requests to a backup, eliminating single points of failure.

Replicating key components to eliminate single point of failure
Replicating key components to eliminate single point of failure

Fault tolerance: Systems must function even when components fail. For instance, if a database node fails during a sale, the system should automatically switch to a backup node using failover mechanisms.

Rate limiting: Rate limiters restrict the number of requests a service handles to prevent overload. On social media, this prevents sudden spikes in activity (e.g., likes, follows) from crashing the system.

Rate limiting to prevent web server overload
Rate limiting to prevent web server overload

CDNs: Content delivery networks (CDNs) distribute cache servers geographically. They improve availability by reducing load on origin servers and mitigating regional outages. They also reduce latency by serving content from locations closer to the user.

Stress testing and monitoring: Stress testing identifies breaking points under peak loads. Monitoring tracks performance in real-time, allowing you to detect anomalies before they cause downtime.

3) Scalability

Scalability is the ability to handle growing numbers of users while maintaining performance. Interviewers may ask you to design a video platform like YouTube or a URL shortener handling billions of queries.

Approaches to achieve scalability

Manual scaling: You can upgrade hardware (vertical) or add machines (horizontal):

  • Vertical scaling (hardware upgrades): Adding resources (RAM, CPU) to existing machines. It is simple to manage but has a hard limit.

  • Horizontal scaling (adding machines): Adding more machines to distribute workload. This is preferred for large-scale applications as it supports load balancing and eliminates single points of failure.

Vertical vs. horizontal scaling
Vertical vs. horizontal scaling

Automatic scaling: Dynamically adjusts resources based on traffic spikes using cloud techniques like auto scaling.

Sharding: Splits a database into smaller shards to distribute data load across servers. Common techniques include key-rangeIt distributes data based on specific ranges of keys. and hash-basedThis distributes the data by applying a hash function to the keys, ensuring even distribution across shards. sharding.

Modular design: Decomposes the system into independent services. Each service scales independently based on demand.

Monolithic vs. modular designs
Monolithic vs. modular designs

Caches and CDNs: Caching reduces database load, while CDNs offload static content delivery from origin servers. Together, they allow the system to handle high request volumes efficiently.

Let's apply these concepts to Google Maps and YouTube.

Acing NFRs: Google Maps and YouTube

Let’s explore non-functional requirements for Google Maps and YouTube System Design problems.

Design Google Maps

A navigation system must identify locations, find optimal routes, and provide turn-by-turn directions.

Strategies to meet Google Maps NFRs:

  • High availability: The road network graph is too large for a single server. To ensure availability, split the graph into segments hosted on separate, replicated servers. A load balancer distributes requests across these segment servers to eliminate single points of failure.

  • Scalability: A distributed architecture allows segment servers to handle requests for specific routes independently. This modular design scales easily by adding segments for new data.

Nonfunctional requirements

Strategies

Availability

  • Divide the road network graph into small graphs (segments) to process user queries.
  • Replicate the small segment servers.
  • Request load balancing across different segment servers.

Scalability

  • Partition the large graphs into smaller graphs to ease segment addition.
  • Host the graphs on different servers to handle increased number of queries per second.

Design YouTube

A video streaming platform enables users to upload, search, stream, and rate videos.

Strategies to meet YouTube NFRs:

  • Minimal response times: Use caching servers at ISP and CDN levels to deliver popular content quickly. Optimize storage by using Bigtable for thumbnails and Blob storage for videos. A lightweight web server (e.g., Lighttpd) efficiently handles video uploads.

  • Reliability: Use data sharding to isolate failures. Replicate critical components for fault tolerance and use heartbeat messagesA node in a distributed system sends a regularly spaced message indicating it's healthy and active. If the node fails to send heartbeat messages, other nodes can assume that the node has failed. to detect and remove faulty servers.

Nonfunctional Requirements

Strategies

Less response time

  • Cache at different layers
  • CDNs
  • Choose appropriate storage systems (e.g., blob storage to store videos, Bigtable to store thumbnails)
  • Serve videos and static content with Lighttpd


Reliability

  • Data sharding
  • Replicate critical components
  • Heartbeat protocol

Quick tips for NFR interview questions

  • Proactively clarify NFRs during the interview. Ask about:

    • Expected user traffic

    • Expected data load

    • Expected downtime tolerance

  • Evaluate trade-offs between techniques, considering complexity, cost, and maintainability.

  • Prepare solutions for common patterns:

    • Transactions: Choose ACID-compliant relational databases.

    • Large-scale data: Use NoSQL databases (MongoDB, Cassandra) for scalability.

    • Real-time data: Use streaming platforms like Apache Kafka or Amazon Kinesis.

There is no one-size-fits-all solution. Success depends on asking clarifying questions, prioritizing NFRs, and justifying your trade-offs.

Conclusion

This lesson covered the role of non-functional requirements in System Design. Understanding common NFRs and how to address them helps you answer System Design interview questions more effectively.