For many software engineers, the System Design Interview remains a mysterious challenge.
Most engineers have never actually worked on large-scale systems before, so having to explain how to build one seems daunting. And because system design interview questions can be so open-ended, it is hard to know the right way to prepare.
Before I spent eight years working on distributed systems at Microsoft and Facebook, I definitely felt this way. But now, as someone who has participated in hundreds of system design interviews, I can assure you that there is a way through.
In this post, I will discuss:
What interviewers are looking for in a System Design Interview
How any developer can prepare to confidently answer System Design questions
In April 2008, I joined an internal team at Microsoft working on a large-scale project building a distributed storage solution.
Amazon had launched their Simple Storage Service in 2006, and Google launched their PaaS solution Google App Engine the same month I joined the team, so we were in the early land grab of cloud computing. Less than two years later, that project was launched to the world as a new product category: Microsoft Azure.
When I joined the Azure team, I came from working on Exchange. I understood server storage and client management, but not at this scale, and certainly not distributed across the world. It required a lot of learning on the job.
Today, the lessons myself and other cloud engineers learned in those early days are codified into the System Design discipline. Now for many companies, the System Design Interview is instrumental in the developer interview process – which means it is vital for landing a job and setting your career on a good trajectory.
By the time I started Educative, I had participated in hundreds of interview loops as both interviewee and interviewer. As Educative has scaled, I have participated in hundreds more.
The experience of working on web-scale systems at Facebook and Microsoft taught me two key skills to approaching the System Design Interview:
Here’s the counterintuitive part: in the System Design Interview, companies are not actually trying to test your experience with System Design.
Successful candidates rarely have much experience working on large-scale systems, and interviewers know this. Again, this is a discipline that has only been around for about fifteen years, and like everything else in software engineering, it is evolving rapidly.
The key is to prepare for the SDI with the intent to apply that knowledge.
Unlike a coding interview question, System Design Interviews are free-form discussions, and there’s no right or wrong answer. Instead, the interviewer is trying to evaluate the candidate’s ability to hold a conversation about the different aspects of the system and assess the solution based on the requirements that might evolve during the conversation.
The best way to think about the conversation is to imagine that you and a colleague are asked to design a large-scale system, and you are hashing out the details on the whiteboard. You are understanding the requirements, scope, and constraints before proposing a solution.
So how do you design a system in an interview if you have never built one in real life? To crack your system design interview, you’ll need to prepare in three areas:
Each of these dimensions flows into the next.
If you don’t know the fundamentals, you won’t be prepared to architect a service; if you don’t know how to put those systems together, you won’t be able to design a specific solution; once you’ve designed large-scale systems, you can take lessons learned and integrate them into your base knowledge.
Let’s look at each of these dimensions in order.
Like with anything else, it is important to start with the basics. The fundamentals of distributed systems can give you the framework of what’s possible and what’s not in a given system.
You can understand the limitations of specific architectures and the trade-offs you will have to make to achieve particular goals (e.g. consistency vs. write throughput). At the most basic level, you need to start with the strengths, weaknesses, and purposes of distributed systems. Be able to talk about topics like:
Know the difference and impacts of failure rates of storage solutions and corruption rates in read-write processes.
The key to unlocking data durability and consistency; replication deals with backups of data, but also being able to repeat processes at scale.
Also called shards, partitions divide data across different nodes within your system. As replication distributes the data across nodes, partitioning distributes processes across nodes. This reduces the reliance on pure replication.
One of your nodes is in Seattle; another is in Lahore; another is in London. There is a system request at 7:05am Pacific Daylight time. Can this be recorded and properly synchronized in the remote nodes, given the travel time of data packets, and can it be concurred? This is a simple problem of consensus – all the nodes need to agree, which will prevent faulty processes from running and ensure consistency and replication of data and processes across the system.
Once you’ve achieved consensus, now transactions from applications need to be committed across databases, with fault checks by each resource involved. Two-way and three-way communication to read, write, and commit are shared across participant nodes.
For a deeper dive into these topics, I recommend Educative’s Scalability & System Design for Developers learning path. If you’re already working in the space, you might want to look at Distributed Systems for Practitioners.
You would have to learn about topics like:
Processing happens at various levels in a distributed system. Some processes are on the client, some on the server, and others on another server - all within one application. These processing layers are called tiers, and understanding how those tiers interact with each other and the specific processes they are responsible for is part of system design for the web.
HTTP is the sole API on which the entire Internet runs – it is the system through which we send every email, stream every Netflix movie, and browse every Amazon listing. REST is a set of design principles to directly interact with the API that is HTTP, allowing efficient, scalable systems with components isolated from each other’s assumptions. Using these principles and open API makes it easier for others to build on your work or extend your capabilities with extensions to their own apps and services.
If you have 99 simultaneous users, load-balancing through DNS routing can ensure that servers A, B, and C each handle 33 clients, rather than server A being overloaded with 99 and servers B and C sitting idle. Routing client requests to the right server, the right tier where processing happens, helps ensure system stability. You need to know how to do this.
A cache serves to have your most frequently requested data and applications accessible to the most users at high speeds. The questions for your web application are what needs to be stored in the cache; how do we direct traffic to the cache; and what happens when we don’t have what we want in the cache?
Stream processing applies uniform processes to the data stream. If an application has continuous, consistent data passing through it, then stream processing allows efficient use of local resources within the application.
Educative offers a useful course on Web Application and Software Architecture that covers these topics and others important for system design, including AJAX, monolithic and microservice architectures, frontends, and databases.
This can seem like a lot, but it honestly takes only a few weeks of prep — less if you have a solid foundation to build on.
Once you know the basics of Distributed Systems and Web Architecture, it is time to apply this learning and design real-world systems. Finding and optimizing potential solutions to these problems will give you the tools to approach the system design interview with confidence.
Once you are ready to practice your skills, you can take on some sample problems from real-world interviews – along with tips and approaches to build ten different web services. I have already written about some of the top system design interview questions in another blog.
If you are feeling ready to jump into the interview phase, there is no better prep resource out there than Grokking the System Design Interview. It was vetted and validated by Uber, Lyft, Microsoft, and Google engineers who helped design and build the actual systems those companies ask about in their interviews.
Consumers and businesses alike are online, and even legacy programs are migrating to the cloud. Distributed systems are the present and future of the software engineering discipline. And as System Design Interview questions make up a bigger part of the developer interview, having a working knowledge of distributed systems will pay dividends in your career.
Join a community of more than 1.2 million readers. A free, bi-monthly email with a roundup of Educative's top articles and coding tips.