Search⌘ K
AI Features

Classical ML in System Design

Explore how to select and justify classical machine learning models such as logistic regression and gradient boosted decision trees for scalable, production-ready systems. Understand their strengths in latency, interpretability, and data efficiency, and learn to apply these models appropriately in system design interviews and real-world applications.

When an interviewer asks you to design an ad CTR prediction system serving 100K QPS with a 10 ms latency SLA, a transformer is usually not the right default choice. A stronger starting point is often logistic regression or a gradient-boosted decision tree. This is not a downgrade in this setting. It can be the right design choice when latency, throughput, and cost dominate. A strong candidate can defend that choice with quantified trade-offs around latency, throughput, model quality, and operational cost.

The previous lesson introduced the baseline-first principle and the Pareto front for model selection. This lesson goes deeper into the classical models themselves, treating logistic regression and gradient boosted trees as production-grade systems rather than stepping stones to deep learning. These two model families power billions of daily predictions at Meta, Google, Uber, and Airbnb. By the end of this lesson, you will have a repeatable framework for defending classical ML choices in interviews with specific, quantified reasoning, and you will understand exactly when these models outperform neural approaches on the axes that matter.

Logistic regression at scale for CTR

Logistic regression remains the backbone of CTR prediction at companies like Google, whose seminal 2013 ad click prediction paper demonstrated that a well-engineered logistic regression model could match or exceed more complex alternatives in production. The reasons are architectural, not nostalgic.

Serving characteristics that matter

A logistic regression model computes a prediction as a single dot product followed by a sigmoid activation, y^=σ(wTx+b)\hat{y} = \sigma(\mathbf{w}^T \mathbf{x} + b) ...