Back-of-the-Envelope Estimation for ML Systems

Explore how to break down ambiguous ML infrastructure questions into manageable parts using back-of-the-envelope estimations. Understand calculating model memory, storage needs, throughput, and training time with real-world assumptions. Gain skills to reason quantitatively, state assumptions clearly, and apply formulas effectively to prepare for senior-level ML system design interviews.

We'll cover the following...

Model size and storage estimation
- Model size estimation
- Storage estimation
Throughput and training time estimation
- Throughput estimation
- Training time estimation
  - The core formula
  - Worked example: GPU requirements for a 24-hour training job
Common pitfalls and interview strategy
Conclusion

The previous lesson showed how metric cannibalization and guardrail design require quantitative reasoning. The same reasoning applies to infrastructure planning, which many candidates underprepare for. When an interviewer asks, “How much storage do we need for 1 billion user embeddings?”, they are not just testing arithmetic. They are testing whether you can break an ambiguous infrastructure question into smaller subproblems, state reasonable assumptions, and produce an order-of-magnitude estimate that can inform a real design decision. A candidate who can reason through this quickly demonstrates the production awareness expected in senior-level interviews.

Back-of-the-envelope estimation is one of the highest-signal skills in ML system design interviews. Interviewers evaluate your engineering judgment, not your arithmetic precision. The primary failure mode is oversimplified assumptions that ignore real-world complexities like mixed-precision training, replication overhead, and data distribution drift, leading to underprovisioned resources and system bottlenecks. This lesson covers four estimation categories that form the quantitative backbone of any ML system design discussion: model size, storage, throughput, and training time.

Note: Interviewers care about your reasoning chain far more than the final number. Stating assumptions explicitly before computing is what distinguishes senior candidates.

The following mindmap provides a structural overview of these four estimation pillars and their key variables:

With this structural overview in place, let’s work through each estimation type with concrete formulas and worked examples.

Model size and storage estimation

These first two estimation types address the most common interview questions about memory footprints and disk requirements. They share a common structure: count the entities, multiply by the bytes per entity, then account for overhead factors that candidates frequently forget.

Model size estimation

The core formula for model memory is straightforward:

$\text{total memory} = \text{parameters} \times \text{bytes per parameter}$

The bytes per parameter depend on the numerical precision used.

FP32 (full precision): Each parameter occupies 4 bytes, used in traditional training setups.
FP16 / BF16 (half precision): Each parameter occupies 2 bytes, now standard for inference and mixed-precision training.
INT8 (quantized): Each parameter occupies 1 byte, common in optimized serving deployments.

For inference, the calculation stops here. But training memory is dramatically larger because the optimizer maintains additional state.A widely used optimization algorithm that maintains two momentum terms (first and second moment estimates) per parameter in addition to the weights and gradients themselves. With Adam optimizerA widely used optimization algorithm that maintains two momentum terms (first and second moment estimates) per parameter in addition to the weights and gradients themselves. in FP32, each parameter requires roughly 16 bytes total: 4 bytes for the weight, 4 bytes for the gradient, and 4 bytes for the two momentum terms each.

Consider a 7B parameter model. For inference in FP16, the footprint is $7 \times 10^9 \times 2 = 14$ GB. For training with Adam in mixed ...

1.The Interview Framework and Communication

2.Problem Formulation and Requirements

3.Data Strategy: Collection, Pipelines, and Features

4.Model Design and Architecture Selection

5.Evaluation: Offline, Online, and Fairness

6.Serving, Deployment, and MLOps

7.Case Study: Video Recommendation System

8.Case Study: Social Feed Ranking System

9.Case Study: Ad Click-Through Rate Prediction System

Mock Interview

10.Case Study: Semantic Search Engine

11.Case Study: Content Moderation System

Mock Interview

12.Case Study: Object Detection System

Mock Interview

13.Case Study: Visual Search System

Mock Interview

14.Case Study: Fraud Detection System

Mock Interview

15.Case Study: RAG-Based Enterprise Knowledge Assistant

16.Case Study: LLM-Powered Code Generation Tool

Back-of-the-Envelope Estimation for ML Systems

Model size and storage estimation

Model size estimation