Cracking the Machine Learning Interview: system design approaches

Aug 13, 2020 - 12 min read
Jerry Ejonavi

Machine learning (ML) is one of the fastest-growing fields and is predicted to grow from $7.3B in 2020 to $30.6B in 2024. Working in the field of ML is exciting and lucrative. But if you want to land a job as a data scientist, you’ll need to go through a competitive interview process.

Most top companies will test your skills in programming, data analysis, and critical thinking. But most important is your mastery of system design, as it shows that you can efficiently solve open-ended problems. These skills set you apart from other engineers.

This article is an introductory guide to Machine Learning interviews for developers hoping to ace the interview and stand apart with system design concepts.

This article is part of Educative’s Machine Learning Interview series, so stay tuned for our article on the top machine learning interview questions and answers.

Today we will cover:

Ace the ML engineer interview using system design concepts.

Learn how to impress an interviewer with your ability to think about systems at a high level.

Grokking the Machine Learning Interview

What is the ML interview?

Machine Learning (ML) is the study of computer algorithms that improve automatically through experience. ML aims at solving a multitude of complex problems and has seen rapid progress in areas like speech understanding, search ranking, credit card fraud detection, and more.

Companies across industries from healthcare and agriculture to manufacturing and retail, are leveraging these technologies to get ahead. Working as a machine learning engineer is exciting and lucrative.

During an interview, you’ll be tested on a variety of skills:

  • Technical and programming skills
  • Data analysis skills, including multiple approaches and technologies
  • System design concepts
  • Your ability to apply machine learning theories effectively
  • Communication skills and cultural fit

This field remains one where a high level of technical skill is required. Interviewers focus on a candidate’s ability to think about systems at a high level. They ask a series of open-ended questions to test a candidate’s ability to solve an end-to-end ML problem.

For example, a candidate may be asked to:

  • Build a recommendation system that shows relevant products to users
  • Build a visual understanding system for a self-driving car
  • Build a search-ranking system

Overview of ML interview concepts and techniques

Performance and Capacity Considerations

As we work on a ML-based system, our goal is to improve our metrics (engagement rate, etc.) while ensuring that we meet the capacity and performance Service Level Agreement (SLA).

Performance based SLA ensures that we return the results back within a given time frame (e.g. 500ms) for 99% of queries. Capacity refers to the load that our system can handle, e.g., the system can support 1000 QPS (queries per second). Major performance and capacity discussions come in during the following two phases of building a ML system:

  • Training time: How much training data and capacity is needed to build our predictor?
  • Evaluation time: What are the SLA that we have to meet while serving the model and capacity needs?

In ML systems, like search ranking, recommendation, and ad prediction, the layered/funnel approach to modeling is the right way to solve for scale and relevance while keeping performance high and capacity in check. In this approach, you start with a relatively fast model when you have the most number of documents, e.g. 100 million documents in case of the query “computer science” for search.

In every later stage, you continue to increase the complexity (i.e. more optimized model in prediction) and execution time but now the model needs to run on a reduced number of documents, e.g. your first stage could use a linear model and the final stage can use a deep neural network.

Training Data Collection Strategies

An ML model learns directly from the data provided to it and creates or refines its rules on a given task based on that data. Therefore, inadequate, irrelevant, or biased data will render even the most performant algorithms useless.

The quality and quantity of training data is a big factor in determining how far you can go in your machine learning optimization task. Data collection techniques primarily involve any of the following:

  • User: user’s interaction with pre-existing system (online)
  • Human labelers (offline): crowdsourcing, open-source datasets like the BDD100K Dataset
  • Specialised Labelers

Additionally, you can utilize other creative data collection techniques. For example, you can build a personalized experience in your product by collecting data from your users. Or, when working with systems that use visual data, such as object detectors or image segmenters, you can use GANs (generative adversarial networks) to enhance the training data. There are other things to consider here too:

  • Data splits
  • Data training
  • Test/validation
  • Data quantity
  • Data filtering

Filtering your data is important since your model will be learning directly from it. Ideally, you want your models to be as free from bias as possible.

Online experimentation

A successful machine learning system must gauge its performance by testing different scenarios. This can lead to more innovations in the model design. For an ML system, “success” can be measured in numerous ways.

To run an online experiment, A/B testing is very beneficial for gauging the impact of new features or changes in the system. In an A/B experiment, a webpage or screen is modified to create a second version of it. The original version is known as the control, and the modified version is the variation. From here, we can formulate two hypothesis:

  • The null hypothesis
  • The alternative hypothesis

We an also use this stage to measure long term effects with back testing and long-running A/B tests.

Experimental framework stages
Experimental framework stages


Embeddings enable us to encode entities (e.g., words, docs, images, person, ad, etc.) in a low dimensional vector space so that it captures their semantic information. Let’s look at two popular text term embeddings generation models and examples of their utilization.

  • CBOW: Continuous bag of words (CBOW) predicts the current word from its surrounding words
  • Skipgram: In this architecture, we try to predict surrounding words from the current word.

Other ML interview concepts and techniques

We’ve gone over the main concepts and techniques we use in ML interview and design. This is just an introduction to the techniques you will need to be successful. To continue learning, I recommend you look into:

  • Transfer learning
  • Model debugging and testing
  • Training data filtering
  • Building models & iterative model improvement

Ace the ML interview with system design.

Get a step-by-step walkthrough through of common system design problems. Educative’s text-based courses are easy to skim and feature live coding environments, making learning quick and efficient.

Grokking the Machine Learning Interview

How to set up an ML system

In an ML interview, you will be expected to set up a system effectively. Let’s get familiarized with the thought process required to answer an interviewer’s questions.

Setting up the problem

Interviewers will generally ask you to design a ML system for a particular task. This question is usually very broad so the first thing you need to do is ask questions. This will help you narrow down the scope of the problem and ensure your system’s requirements closely match the interviewer’s.

Your conversation should also include questions about performance/speed and capacity considerations of the system.

The answers to these questions will guide you when you come up with the architecture of the system. Knowing that you need to return results quickly will influence the depth and complexity of your models.

Defining the metrics of the problem

The next step is to carefully choose your system’s performance metrics for both online and offline testing. The metrics you choose will depend on the problem your system is trying to solve.

For example, if you are performing binary classification, you will use the following offline metrics: Area Under Curve (AUC), log loss, precision, recall, and F1-score.

While coming up with online metrics, you may need both component-wise and end-to-end metrics. Component-wise metrics are used to evaluate the performance of ML systems that are plugged in to and used to improve other ML systems.

An end-to-end metric evaluates a system’s performance after an ML model has been applied. For example, a metric for a search engine would be the users’ engagement and retention rate after your model has been plugged in.

Architecture discussion

The next step is to design your system’s architecture. You need to think about the components of the system and how the data will flow through those components. In this step, you need to be careful to design a model that can scale easily.

Architectural components for ML system of search engine
Architectural components for ML system of search engine

In most cases, your problem will involve a system with a huge and ever-increasing number of data. For example, if you are tasked with building an ML system that displays relevant ads to users, your model can’t process every ad in the system at once.

Instead you could use the funnel approach, where each stage will have fewer ads to process. At the end, you will have a scalable system that quickly figures out the relevant ads for all users despite the increase in data. When you have nailed down all of your ML system’s requirements, you can proceed to building your model. This involves:

  1. Training Data Generation: This involves sourcing data for use in training your models. This can either be manually labelled data or it can be collected from a user’s interaction with the pre-existing system.
  2. Feature Engineering: In order to implement a feature, you would need to identify the primary actors involved in the given task, individually inspect these actors and explore their relationships.
  3. Model Training: Here, you will make a decision on what model to use for your system.
  4. Offline Evaluation: This is very beneficial, as it allows you to quickly test many different models. The most promising models are selected for online testing, which is a slow process.
  5. Online Execution, Evaluation and Iterative Improvement

Now, let’s get acquainted with the task of building an entity linking system in the next section.

Building an entity linking system

Named entity linking (NEL) is the process of detecting and linking entity mentions in a given text to corresponding entities in a target knowledge base. There are two parts to entity linking:

  • Named-entity recognition: NER detects and classifies potential named entities in the text into predefined categories such as a person, organization, location, medical code, time expression, etc.
  • Disambiguation: This disambiguates each detected entity by linking it to its corresponding entity in the knowledge base.

Let’s see Entity Linking in action in the following example:

Entity linking overview
Entity linking overview

The sentence/text says, “Michael Jordan is a machine learning professor at UC Berkeley.” First NER detects and classifies the named entities Michael Jordan and UC Berkeley as person and organization.

Then disambiguation takes place. Assume that there are two ‘Michael Jordan’ entities in the given knowledge base, the UC Berkeley professor and the athlete. Michael Jordan in the text is linked to the professor at the University of California, Berkeley entity in the knowledge base (that the text is referring to). Similarly, UC Berkeley in the text is linked to the University of California entity in the knowledge base.


Entity linking has applications in many natural language processing tasks. The use cases can be broadly categorized as information retrieval, information extraction and building knowledge graphs, which in turn can be used in many systems, such as:

  • Semantic search
  • Content analysis
  • Question answering systems/chatbots/virtual assistants

All of the above-mentioned applications require a high-level representation of the text, in which concepts relevant to the application are separated from the text and other non-meaningful data.

Problem statement

The interviewer has asked you to design an entity linking system that:

  • Identifies potential named entity mentions in the text.
  • Searches for possible corresponding entities in the target knowledge base for disambiguation.
  • Returns either the best candidate corresponding entity or nil.

The problem statement translates to the following machine learning problem:

"Given a text and knowledge base, find all the entity mentions in the text (Recognize) and then link them to the corresponding correct entry in the knowledge base (Disambiguate).”

Interview questions for entity linking

These are some of the questions that an interviewer can put forth during a discussion on entity linking systems.

  • How would you build an entity recognizer system?
  • How would you build a disambiguation system?
  • Given a piece of text, how would you extract all persons, countries, and businesses mentioned in it?
  • How would you measure the performance of a disambiguator/entity recognizer/entity linker?
  • Given multiple disambiguators/recognizers/liners, how would you figure out which is the best one?

What to learn next

Congrats! You were introduced to ML system concepts and techniques for implementing them. Finally, you learned some strategies for approaching any interview question based on an actual system design concept.

There’s still a lot to learn about machine learning and system design. You’ll need to master the following systems:

  • Ad prediction system
  • Self-driving car systems
  • Recommendation system
  • Feed-based system
  • Search ranking

To get up to speed on these concepts and strategies, check out Educative’s unparalleled course Grokking the Machine Learning Interview. This course helps you master ML design and answer some of the most popular interview problems at big tech companies.

Once you’re done with the course, you’ll be able to not just ace the machine learning interview at any tech company, and impress them with your ability to think about systems at a high level!

If you want even more practice with system design questions for machine learning interviews, check out Machine Learning System Design.

Continue reading about machine learning

WRITTEN BYJerry Ejonavi

Join a community of 500,000 monthly readers. A free, bi-monthly email with a roundup of Educative's top articles and coding tips.