Back-of-the-Envelope Estimation for ML Systems
Explore how to break down ambiguous ML infrastructure questions into manageable parts using back-of-the-envelope estimations. Understand calculating model memory, storage needs, throughput, and training time with real-world assumptions. Gain skills to reason quantitatively, state assumptions clearly, and apply formulas effectively to prepare for senior-level ML system design interviews.
We'll cover the following...
The previous lesson showed how metric cannibalization and guardrail design require quantitative reasoning. The same reasoning applies to infrastructure planning, which many candidates underprepare for. When an interviewer asks, “How much storage do we need for 1 billion user embeddings?”, they are not just testing arithmetic. They are testing whether you can break an ambiguous infrastructure question into smaller subproblems, state reasonable assumptions, and produce an order-of-magnitude estimate that can inform a real design decision. A candidate who can reason through this quickly demonstrates the production awareness expected in senior-level interviews.
Back-of-the-envelope estimation is one of the highest-signal skills in ML system design interviews. Interviewers evaluate your engineering judgment, not your arithmetic precision. The primary failure mode is oversimplified assumptions that ignore real-world complexities like mixed-precision training, replication overhead, and data distribution drift, leading to underprovisioned resources and system bottlenecks. This lesson covers four estimation categories that form the quantitative backbone of any ML system design discussion: model size, storage, throughput, and training time.
Note: Interviewers care about your reasoning chain far more than the final number. Stating assumptions explicitly before computing is what distinguishes senior candidates.
The following mindmap provides a structural overview of these four estimation pillars and their key variables:
With this structural overview in place, let’s work through each estimation type with concrete formulas and worked examples.
Model size and storage estimation
These first two estimation types address the most common interview questions about memory footprints and disk requirements. They share a common structure: count the entities, multiply by the bytes per entity, then account for overhead factors that candidates frequently forget.
Model size estimation
The core formula for model memory is straightforward:
The bytes per parameter depend on the numerical precision used.
FP32 (full precision): Each parameter occupies 4 bytes, used in traditional training setups.
FP16 / BF16 (half precision): Each parameter occupies 2 bytes, now standard for inference and mixed-precision training.
INT8 (quantized): Each parameter occupies 1 byte, common in optimized serving deployments.
For inference, the calculation stops here. But training memory is dramatically larger because the optimizer maintains additional state.
Consider a 7B parameter model. For inference in FP16, the footprint is