How to answer my favorite System Design question: Design Spotify
Master system design by tackling a real-world question like designing Spotify. Learn how to break down requirements, estimate scale, and build scalable architectures while developing the structured thinking needed to succeed in system design interviews.
This post aims to address two audiences.
- Interviewers: Senior software engineers and engineering managers should get an idea of how to conduct a System Design interview and evaluate candidates.
- Candidates: Engineers should learn what is expected for a System Design Interview, how to prepare, and how to approach the conversation.
My favorite SD question to ask is something along the lines of “design a music streaming service like Spotify.” Streaming services for music and video are great systems for understanding a candidate’s thought process and technical acumen. They’re appropriately flashy and interesting but still demand a sufficient level of forethought and scalability.
Familiarity: The majority of streaming services should be familiar to candidates, at least on the user side. It is important for a System Design question to be a technology or product that is well-known and widely used.
To start, I’ll briefly explain the general hierarchy of SWE seniority. Then we will discuss the requirements of the system and how candidates should approach this kind of question in their interview.
System Design is one of the key determining factors in filtering software engineers to different levels of seniority. I’ve seen candidates that applied to a senior engineering position be swiftly down-leveled because of a few technical oversights in a System Design interview.
The majority of engineers at large companies fall somewhere between E4 and E6. I’ll briefly cover what is expected out of a candidate at each of these levels.
Entry-level: At this level, engineers have a narrow focus on a few different software components and how they interact with each other.
Senior: They have a more holistic view of the software system they’re working on, and can describe various scenarios from end-to-end. They can explain how each scenario is executed, give concrete examples, and offer ways to improve the resiliency of a system.
Staff: Engineers at this level are capable of everything mentioned above, but they also monitor the software system over the course of its entire lifetime. By considering the architecture’s ability to sustain and support growth, they plan how a system evolves and scales.
It’s not all about down-leveling though. If you’re prepared, you may find yourself being offered a position more senior than the role you applied for.
Functional requirements#
- Users should be able to stream music.
- The system should store an archive of music, sorted by artist and album.
- The database of songs should be searchable.
Non-functional requirements#
- Streaming should be very low-latency. Music should begin playing within 200ms of a user pressing play.
- The system should support a repository of 100 million songs.
An interviewer is not expecting exactly correct answers that correspond with a rubric. There is , in fact, no “right” answer. Instead, they want to see comprehension of the problem at hand. A good interviewee will lead a conversant and comfortable walkthrough of their assumptions, calculations, tradeoffs, and design choices.
Some of the best advice I can give to both interviewers and interviewees pertains to asking questions. It’s great if a candidate asks all the clarifying questions they need to when posed with a problem, but ultimately, the interviewer should provide a guiding hand. If a candidate fails to ask crucial questions, the interviewer shouldn’t let them lead the conversation astray. If you’re an interviewer be sure to reveal key expectations and assumptions of the problem even if an interviewee doesn’t know to ask for them.
When designing any large-scale distributed system, there needs to be a range of clarifying questions that a candidate should ask. These questions about designing Spotify are arranged from basic to advanced. Interviewers should have an idea of what level of SWE an interviewer is just based on the clarifying questions they know to ask.
- How big is the music repository?
The standard for most music streaming platforms is 100 million songs. We answered this already when talking about the non-functional requirements, but in some cases this information won’t be offered immediately.
- How frequently is the repo updated?
Every week.
- How many users does the service have?
There are hundreds of millions of users, but there is a more pertinent follow-up question that only skilled candidates will really know to ask.
- How many concurrent users are there?
This is the question that really matters when it comes to System Design. Even if a candidate doesn’t ask it, the interviewer should give them the hint that they should expect an average of 5 million active users, with peak traffic being around 10 million active users.
- Of the concurrent users, how many are streaming music?
A followup question that is, again, okay to divulge unprompted. On average about 80% of the active users will be streaming music with the remaining 20% sticking to low load activities like browsing and managing their playlists.
These are not the only questions that are relevant to designing Spotify, but they provide a great foundation for the conversation to come. After the round of clarifying questions, hopefully the candidate is a little more comfortable and the interviewer has made some initial notes as to how they expect the candidate to proceed.
At this point, it should be fairly simple to come up with the high-level design of a workflow of the system.
- A user makes a search.
- A search indexer parses data.
- The system returns a page of search results.
- The user clicks on a file.
- Music starts streaming.
The real meat of the problem comes from designing the system for low latency.
This high-level architecture gives you a starting point for discussing trade-offs. In an interview, your goal isn’t to jump into microservices immediately; it’s to clearly communicate how requests flow through the system.
Nice numbers: When picking numbers for estimations, it is best to stick to 5s and 10s. Otherwise, your back of the napkin calculations quickly become more about you doing grade school math and less about designing the system at hand.
Back-of-the-envelope estimation: understanding scale before design#
One of the most underrated parts of a system design interview is the ability to quickly estimate scale. Many candidates jump straight into architecture diagrams or technologies, but experienced engineers almost always pause first and ask a simple question: how big is this system going to be?
Back-of-the-envelope estimation helps you answer that question. It allows you to ground your design decisions in reality and justify why certain components—like caching layers or CDNs—are necessary. The goal isn’t perfect accuracy. It’s clarity of thinking.
Let’s walk through a practical example using our Spotify-like system.
Estimating user traffic#
To begin, we need a rough sense of how many people are using the system. Suppose the platform has around 500 million registered users, with approximately 100 million active users on a daily basis. Out of these, a smaller subset will be active at any given moment. A reasonable estimate is around 5 million concurrent users, with peak traffic reaching closer to 10 million.
Next, consider how often users interact with the system. If an average user listens to around 20 songs per day, that leads to roughly 2 billion streams daily across the platform. When we spread that over the number of seconds in a day, we arrive at tens of thousands of requests per second.
At this point, even a rough estimate tells us something important: this is not a small system. It must be designed to handle sustained high throughput without degradation in performance.
Estimating storage requirements#
Once we understand traffic, the next step is to think about storage. Let’s assume the system hosts around 100 million songs, and each song takes up approximately 5 MB on average. This alone results in around 500 terabytes of raw storage.
However, in real-world distributed systems, storing a single copy of data is not enough. To ensure reliability and fault tolerance, data is typically replicated across multiple servers or regions. With a standard replication factor of three, the storage requirement increases to roughly 1.5 petabytes.
There is another important consideration here. Music streaming services do not store just one version of a song. They store multiple versions at different quality levels so that users with slower connections can still stream smoothly. If we account for low, medium, and high-quality versions, the total storage requirement increases significantly, reaching several petabytes.
This estimation alone tells us that we cannot rely on a traditional single-database setup. We need scalable object storage and distributed data systems.
Estimating bandwidth and data transfer#
Now, let’s think about how much data needs to be delivered in real time. Streaming audio requires a continuous flow of data rather than a one-time download. If we assume an average bitrate of around 320 kbps, each user consumes roughly 40 KB per second while streaming.
With 5 million concurrent users, the system must handle hundreds of gigabytes of data transfer every second. This is a massive amount of bandwidth, and it immediately highlights the need for geographically distributed infrastructure.
Serving all this data from a single location would introduce unacceptable latency. This is why content delivery networks (CDNs) become a critical part of the design.
Turning estimates into design decisions#
The real value of back-of-the-envelope estimation is not the numbers themselves, but what they imply. Once you understand the scale, your design decisions become much clearer.
For example, the need for distributed object storage becomes obvious when dealing with petabytes of data. The importance of caching emerges when you consider the volume of read requests. The role of CDNs becomes unavoidable when you look at bandwidth and latency requirements.
In an interview, this is exactly what the interviewer is looking for. They want to see that you can connect scale to architecture, rather than treating system design as a collection of disconnected components.
How to present this in an interview#
A strong candidate doesn’t spend excessive time calculating precise values. Instead, they briefly walk through assumptions, derive approximate numbers, and then use those numbers to guide the design.
A simple explanation, such as “Given the number of users and expected traffic, we’re dealing with tens of thousands of requests per second and petabyte-scale storage,” is often enough to demonstrate strong system thinking.
The key is not the math. It’s the reasoning behind it.
Storage considerations
Assuming that the average song is 5 minutes and takes up 5MB of storage, we can calculate how much storage it will take to store 100 million songs. Given just these numbers, we can begin by saying that it will take 500TB to store this data.
The candidate should build upon this assumption. It is important to store multiple copies of the data so that songs will always be available even in the event of a partial failure of the system. The industry standard is to replicate data three times, so with replication, the total storage is now up to 1500TB.
A really strong candidate – E6 level or equivalent – may recognize that the system not only needs to replicate data, but create and keep files of different qualities. Much like a video streaming service, music streaming services also allow users to stream different song qualities based on their network connection and individual preferences. If a user is driving through a place with a spotty network, they should still be able to seamlessly stream music, just at a lower quality.
For simplicity’s sake, we can say that our low quality files are 1MB per song and the high quality files are 10MB per song. With these added provisions, the required storage is roughly around 5000TB.
Multimedia is not the only data consideration, however. A candidate should be sure to include metadata. On the music side, there are artist names and bios, album covers, album names, song titles, and potentially lyrics — but the system should store user metadata as well. Given the number of users, metadata adds up, and will ultimately take up a significant amount of space. Metadata also demands a different storage location than multimedia data and will affect the high-level design components that are deemed necessary.
Design for low latency
Given the assumption that the average song is around 5MB, and that the average 3G connection reaches speeds of 3-5 megabits/second, it would take ~8 seconds to download a 5MB song. This is significantly longer than 200ms. How candidates tackle this problem will likely reveal the most about their individual skill sets or problem solving tendencies.
The key idea that a candidate should get is that the system will have to chunk song files and buffer their download. The system should be able to rapidly download the first couple seconds of a song and then use the playback of those seconds to download more and more of the song.
If the device is able to download 0.1MB, it can begin playing the song almost instantly. Then, while the first few seconds are playing, the system can download the next chunks of the song. After about 10 seconds, the system will have the complete song downloaded. Really talented candidates will even highlight the possibility of using the time spent streaming to cache the next couple of songs in the queue. In doing so, we can create a better user experience if they decide to skip a song or two.
Don’t rush through your analysis of how to design the system, but having extra time to expand upon your assessment can be extremely helpful to both you and your interviewer. If you wind up with extra time, take initiative and discuss relevant design specifics that align with your interest and area of specialization. For example, in the “design Spotify” problem, areas to hone in on are:
- How to build a search index
- Adaptive streaming
- API design/API calls
- Recommendation engine (for machine learning candidates)
Content delivery network (CDN)
A content delivery network is crucial to ensure low latency for a global system, especially one that is data intensive. It is important to have nodes that are physically close to geographically significant areas. For example, the two-way latency from a node in Virginia (U.S. East) to one in California (U.S. West) and back is around 63 ms. And from that same U.S. East location to one in Cape Town, South Africa is 225 ms.
Our system may allow music to start playing within the >200 ms window, but only if the user is close to the main node of the system. Accounting for travel time latency adds an additional layer of complexity to our non-functional requirements. To ensure a positive user experience, we need a CDN to minimize response times by optimizing the delivery of data based on location.
To set up a CDN, we need to have a routing service that directs data to the correct proxy services based on the location of the request. A CDN will also need to be considered in the API design of the system. Web servers and load balancers will need to go through the CDN’s routing service before a response can be delivered.
A content delivery network is a complex system by its own nature, and it necessitates a more in-depth explanation than can be communicated here. If you’re interested in delving more into the infrastructure of a CDN, this lesson on Designing Content Delivery Network from our course Grokking Modern System Design Interview for Engineers & Managers outlines the complete system architecture in-depth.
How real-world systems solve similar problems#
Designing a music streaming service like Spotify isn’t just a theoretical exercise. Many of the challenges you encounter in this problem, such as low-latency delivery, massive scale, and efficient data distribution, have already been solved in different ways by real-world systems.
Looking at how companies like Uber, WhatsApp, and Netflix approach similar problems can help you ground your design decisions in reality and demonstrate stronger system thinking in interviews.
Uber: handling real-time scale with distributed systems#
Uber operates in a completely different domain, but the underlying system design challenges are surprisingly similar. At peak hours, Uber must handle millions of concurrent users requesting rides, tracking locations, and receiving updates in real time.
To support this scale, Uber relies heavily on distributed systems and aggressive caching strategies. Data is partitioned geographically so that users are served by systems closest to them. This reduces latency and improves responsiveness, especially during high-traffic periods.
In the context of a music streaming system, this approach translates directly. Instead of serving all users from a central data center, your system should distribute traffic across regions and cache frequently accessed data closer to users. This ensures that playback starts quickly and remains smooth even under heavy load.
Uber’s architecture also highlights the importance of designing for peak traffic rather than average usage. In a Spotify-like system, this means preparing for sudden spikes, such as new album releases or viral songs.
WhatsApp: optimizing for low latency and high availability#
WhatsApp is another system where performance and reliability are critical. With billions of users sending messages in real time, the system must deliver data almost instantly while maintaining high availability.
One of WhatsApp’s key design principles is simplicity at scale. The system uses efficient data storage and lightweight protocols to minimize latency. It also ensures redundancy by replicating data across multiple servers, so that failures do not disrupt the user experience.
For a music streaming service, similar principles apply. Users expect songs to start playing immediately, without buffering delays. Achieving this requires minimizing the number of steps between a user request and data delivery. It also requires ensuring that data is always available, even if parts of the system fail.
WhatsApp’s design reinforces the idea that low latency is not just about speed—it’s about removing unnecessary complexity and ensuring reliability at every layer.
Netflix: mastering global content delivery#
Netflix provides one of the clearest real-world parallels to a Spotify-like system. It delivers large volumes of media content to users across the globe, all while maintaining a seamless viewing experience.
One of Netflix’s most important architectural decisions is its heavy reliance on content delivery networks. Instead of streaming content from a central location, Netflix caches content on servers distributed around the world. This allows users to access data from nearby locations, significantly reducing latency.
Netflix also invests heavily in adaptive streaming. Rather than sending a fixed-quality video, the system dynamically adjusts quality based on network conditions. This ensures uninterrupted playback, even on unstable connections.
These ideas translate directly to music streaming. A Spotify-like system should use CDNs to deliver songs efficiently and support multiple quality levels to adapt to varying network conditions. This not only improves performance but also enhances the overall user experience.
What these case studies teach you#
When you step back and look across these systems, a few consistent patterns emerge. Large-scale applications rely on distributed architectures, prioritize low latency, and design for failure from the beginning.
In an interview, referencing these kinds of real-world systems shows that you’re not just designing in isolation. You’re thinking like an engineer who understands how production systems actually work.
And that’s often what separates a good answer from a great one.
Grokking Modern System Design Interview for Engineers & Managers
For a decade, when developers talked about how to prepare for System Design Interviews, the answer was always Grokking System Design. This is that course — updated for the current tech landscape. As AI handles more of the routine work, engineers at every level are expected to operate with the architectural fluency that used to belong to Staff engineers. That's why System Design Interviews still determine starting level and compensation, and the bar keeps rising. I built this course from my experience building global-scale distributed systems at Microsoft and Meta — and from interviewing hundreds of candidates at both companies. The failure pattern I kept seeing wasn't a lack of technical knowledge. Even strong coders would hit a wall, because System Design Interviews don't test what you can build; they test whether you can reason through an ambiguous problem, communicate ideas clearly, and defend trade-offs in real time (all skills that matter ore than never now in the AI era). RESHADED is the framework I developed to fix that: a repeatable 45-minute roadmap through any open-ended System Design problem. The course covers the distributed systems fundamentals that appear in every interview – databases, caches, load balancers, CDNs, messaging queues, and more – then applies them across 13+ real-world case studies: YouTube, WhatsApp, Uber, Twitter, Google Maps, and modern systems like ChatGPT and AI/ML infrastructure. Then put your knowledge to the test with AI Mock Interviews designed to simulate the real interview experience. Hundreds of thousands of candidates have already used this course to land SWE, TPM, and EM roles at top companies. If you're serious about acing your next System Design Interview, this is the best place to start.
For candidates: There is a lot to say about how to do well in your next SDI. The best tip I have is to spend an adequate amount of time preparing. System Design is no simple task, and not something you can just improvise on the spot.
During your preparation and your next interview, keep these pieces of advice in mind:
- Ask clarifying questions and state assumptions.
- Discuss relevant data structures and algorithms.
- Plan for scalability.
For interviewers: Be engaged in the conversation and try to help a candidate along. Even if they aren’t asking the right questions, don’t let them flounder. As an interviewer, you can help guide them. It’s entirely possible that a great candidate will get flustered and need some time to warm up and get in the groove.
Here are a couple more quick takeaways:
- Evaluate a candidate on their interview performance without letting their System Design experience (or lack thereof) get in the way.
- Let them take the conversation where they feel most comfortable. A front end developer will probably want to talk about APIs, while a machine learning engineer may be eager to show off their recommendation engine skills. Both are highly relevant to the system at hand and ultimately help you determine the best fit in the long run.
If you’re looking to prepare for your next interview there is no better resource than the Educative course: Grokking Modern System Design Interview and Grokking the Frontend System Design Interview. This course describes in detail all major System Design building blocks and then walks through over a dozen more real-world System Design problems in an interview style format.
Happy learning!