The Model Selection Framework

Understand a 5-step framework to select machine learning models by balancing accuracy, latency, cost, interpretability, and data needs. Learn to start with simple baselines like logistic regression and escalate to complex models like deep learning or LLMs only when justified. Gain insights into operational trade-offs, infrastructure costs, and effective communication to excel in ML system design interviews.

We'll cover the following...

Start simple and justify escalation
When deep learning becomes necessary
- Escalation triggers
- Infrastructure cost of escalation
LLMs vs. task-specific models
- When LLMs fit the problem
- When LLMs are the wrong choice
Putting the framework into practice

You are designing a fraud detection system for a fintech company that processes millions of transactions per day. The interviewer asks, “What model would you use?” This is not a recall test for the latest research architecture. It is a design question that affects downstream decisions, including serving infrastructure, latency budgets, training pipelines, and operational cost. Your answer shows whether you can balance model quality with production constraints.

Model selection in ML system design interviews functions as a litmus test for engineering maturity. Candidates who immediately reach for a transformer-based architecture without justifying the added complexity are penalized. Candidates who start with a simple baseline and then articulate precise conditions for escalation demonstrate the kind of production thinking that MAANG interviewers reward. The difference between these two responses is not knowledge of models but fluency in reasoning about trade-offs.

This lesson gives you a repeatable framework for that reasoning. Every model choice you make in an interview should be evaluated along five trade-off axes: accuracy (how well the model performs on offline and online metrics), latency (whether inference fits within the serving SLA), cost (training and serving expenses at production scale), interpretability (whether stakeholders or regulators can understand decisions), and data requirements (how much labeled data the model needs to outperform alternatives). These five axes organize the rest of this lesson and will become your default vocabulary for defending model choices under pressure.

The following decision flowchart captures how these axes interact to guide you from task definition to a recommended model family:

With this map in hand, the next step is understanding why the simplest path on that flowchart is almost always the correct starting point.

Start simple and justify escalation

Logistic regression remains the workhorse for click-through rate prediction at companies like Meta and Google. It delivers sub-millisecond inference, full interpretability through direct feature weights, and trivial deployment to thousands of serving nodes. GBDT (Gradient Boosted Decision Tree)An ensemble learning method that builds a sequence of decision trees, where each new tree corrects the errors of the previous ones, producing strong predictive performance on tabular data. models such as XGBoost and LightGBM dominate tabular ranking and fraud detection because they handle feature interactions, missing ...

1.The Interview Framework and Communication

2.Problem Formulation and Requirements

3.Data Strategy: Collection, Pipelines, and Features

4.Model Design and Architecture Selection

5.Evaluation: Offline, Online, and Fairness

6.Serving, Deployment, and MLOps

7.Case Study: Video Recommendation System

8.Case Study: Social Feed Ranking System

9.Case Study: Ad Click-Through Rate Prediction System

Mock Interview

10.Case Study: Semantic Search Engine

11.Case Study: Content Moderation System

Mock Interview

12.Case Study: Object Detection System

Mock Interview

13.Case Study: Visual Search System

Mock Interview

14.Case Study: Fraud Detection System

Mock Interview

15.Case Study: RAG-Based Enterprise Knowledge Assistant

16.Case Study: LLM-Powered Code Generation Tool

The Model Selection Framework

Start simple and justify escalation