Identifying the ML Task Type
Discover how to correctly identify one of the seven primary machine learning task types based on problem requirements and output contracts. Understand their distinct input-output behaviors, recognition signals, and implications for model architecture, loss functions, evaluation metrics, and serving constraints. Learn a three-step framework to justify task-type decisions in interviews by aligning business metrics, task outputs, and production considerations. This lesson helps you avoid costly early mistakes and demonstrates the senior-level awareness needed to design robust ML systems.
When an interviewer says “Design a system to rank YouTube search results,” the very next move you make determines the trajectory of your entire design. Pick the wrong ML task type and every downstream decision, including loss function, evaluation metric, and serving architecture, lands on a flawed foundation. This lesson gives you a reliable method for getting that decision right every time.
With clarifying questions answered and requirements locked from the previous phase, the critical next step is translating those requirements into a concrete ML task type. Consider the YouTube example again. The output the system must produce is an ordered list of videos, not a binary label indicating whether a single video is relevant. That distinction alone separates a ranking formulation from a classification formulation, and it changes the loss function from cross-entropy to a pairwise or listwise ranking loss. Experienced interviewers evaluate whether you can make this distinction and justify it with reasoning tied to the problem’s input-output contract and business objective.
This lesson covers seven primary ML task types that appear repeatedly in system design interviews. Each one has a distinct input-output contract, recognition signals in the problem statement, and architectural implications for serving.
Note: Choosing the wrong task type is one of the most costly early mistakes in an ML system design interview. It cascades into misaligned metrics, incorrect model architectures, and flawed evaluation strategies.
The following map provides a bird’s-eye view of seven task types, their definitions, and one canonical example for each.
With this taxonomy in view, the next step is understanding what makes each task type distinct and how to spot it from a problem statement.
Defining each task type
Recognizing the correct task type from an interview prompt requires understanding the defining characteristics of each one. The differences come down to what the model outputs, what signals appear in the problem statement, and what architectural constraints follow.
Recognition heuristics for each type
The seven task types each carry specific linguistic and structural signals that surface during the problem statement or clarifying questions.
Ranking produces an ordered list scored by relevance or utility. Look for phrases like “show the most relevant,” “order by,” or “top-K results.” Canonical examples include search ranking, feed ranking, and ad auction ordering. The model scores each candidate, and a sorting step produces the final output.
Retrieval efficiently narrows a massive candidate pool, millions to billions of items, down to a manageable set. Look for scale signals and phrases like “find candidates” or “shortlist.” Examples include approximate nearest neighbor (ANN) retrieval in recommendation candidate generation and document retrieval in RAG pipelines.
Classification assigns a discrete label, either binary or multi-class, to an input. Look for “is this X or Y,” “detect,” “flag,” or “categorize.” Examples include spam detection, content moderation, and sentiment analysis. A single forward pass through the model produces a probability distribution over labels.
Regression predicts a continuous numeric value. Look for “predict the value of,” “estimate,” or “forecast.” Examples include Uber ETA prediction, ad bid price estimation, and demand forecasting.
Generation produces new content such as text, images, or code. Look for “generate,” “compose,” “summarize,” or “translate.” Examples include chatbot responses, image synthesis, and code completion. Generation tasks typically involve
, which introduces fundamentally different latency characteristics compared to a single-forward-pass classifier.autoregressive decoding A sequential process where the model generates one token at a time, conditioning each new token on all previously generated tokens. Anomaly detection identifies rare events that deviate from learned normal behavior. Look for extreme class imbalance, “detect fraud,” “identify outliers,” or “flag unusual activity.” Examples include credit card fraud detection and infrastructure anomaly monitoring.
Clustering groups unlabeled data points by similarity without predefined categories. Look for “segment,” “group,” or “discover patterns.” Examples include user segmentation for marketing and topic discovery in document corpora.
Latency implications of task-type choice
A critical nuance from production practice is that task-type choice constrains your serving architecture. A generation task with autoregressive decoding may require hundreds of milliseconds per response, demanding techniques like model sharding and speculative decoding to meet latency budgets. A classification task with a single forward pass can often serve predictions in single-digit milliseconds. When you name a task type in an interview, you are implicitly committing to a latency profile.
Practical tip: After naming the task type, immediately state its latency implication. Saying “This is a generation task, so we need to budget for sequential decoding latency” signals production awareness.
The following table consolidates the recognition signals, input-output contracts, and typical loss functions for quick reference during interview preparation.
Machine Learning Task Types Overview
Task Type | Input-Output Contract | Key Recognition Signal | Canonical Interview Problem | Typical Loss Function |
Ranking | Query + candidate set → ordered list | "Order by relevance" | Search ranking | Pairwise or listwise loss |
Retrieval | Query → candidate subset from large corpus | "Narrow billions to hundreds" | Recommendation candidate generation | Contrastive loss |
Classification | Input → discrete label | "Detect, flag, categorize" | Spam detection | Cross-entropy |
Regression | Input → continuous value | "Predict, estimate, forecast" | ETA prediction | MSE or MAE |
Generation | Input → new content | "Generate, compose, summarize" | Chatbot response | Cross-entropy with teacher forcing |
Anomaly Detection | Input → normal vs. anomalous flag | "Extreme imbalance, flag outliers" | Fraud detection | Reconstruction error or one-class loss |
Clustering | Unlabeled data → groups | "Segment, discover patterns" | User segmentation | Intra-cluster distance |
With each task type clearly defined, the next challenge is handling situations where a single problem legitimately fits more than one type.
When one problem maps to multiple types
Not every interview problem maps cleanly to a single task type. Consider designing Airbnb search. You could frame it as ranking (order listings by booking probability), classification (predict book vs. not-book for each listing), or regression (predict expected booking value per listing). All three framings are technically valid, and the interviewer is watching to see how you navigate this ambiguity.
A three-step decision framework
The right framing depends on the business objective you surfaced during clarifying questions. A repeatable three-step framework helps you choose and defend your decision.
Identify the output contract: Ask what the consumer of the model output actually needs. If the search page must display an ordered list of listings, the consumer needs a ranked output, not just a binary label per listing.
Align with the business metric: If the KPI is revenue, regression on expected booking value may dominate because it directly optimizes for dollars. If the KPI is engagement or conversion rate, ranking by booking probability may be more appropriate.
Consider serving constraints: Ranking requires scoring and sorting a candidate set within a latency budget, which may demand a multi-stage retrieval-then-rank pipeline. A binary classifier can score items independently, simplifying the serving layer but losing the ordering signal.
Attention: Silently picking one framing without acknowledging alternatives is a missed opportunity. Interviewers reward candidates who name the alternative framings and then justify their choice with explicit trade-off reasoning.
It is also worth noting that
Now test your ability to identify the correct task type from interview-style prompts.
Lesson Quiz
Design a system that predicts expected delivery time for a food ordering app. Which ML task type best fits this problem?
Classification
Regression
Ranking
Anomaly detection
With the recognition skill practiced, the final piece is a repeatable verbal template for defending your choice in a live interview.
A framework for defending your choice
Knowing the right task type is necessary but not sufficient. You must also articulate your reasoning concisely under interview pressure. A three-sentence template provides structure without sounding rehearsed.
The template works as follows. First, state the output contract and map it to the task type. Second, name an alternative framing and explain why you rejected it. Third, connect your choice to a specific loss function and evaluation metric that align with the business KPI.
Here is the template applied to a concrete example: designing a notification relevance system for a social media app. ”The output contract for this problem is an ordered list of notifications scored by likelihood of user engagement, which maps to a ranking task. I considered framing it as binary classification (will the user tap or not) but the product surface requires ordering notifications within a feed, so ranking better serves the UX. This choice implies a
This task-type decision feeds directly into the next phase of the interview, where you define functional and non-functional requirements. The next lesson covers that decomposition in depth.
Most candidates can name a task type. The ones who stand out at L4 through Staff+ levels are those who justify it against alternatives and connect it to loss functions, metrics, and serving constraints in a single coherent argument.
The following flowchart captures the full decision process in a visual format you can internalize for interview day.
Conclusion
This lesson covered the seven ML task types (ranking, retrieval, classification, regression, generation, anomaly detection, and clustering) along with their defining input-output contracts and the recognition signals that surface them from interview prompts. The three-step decision framework (output contract, business metric alignment, serving constraints) gives you a repeatable method for choosing and defending a task-type framing when multiple options exist. Remember that this decision is the bridge between the requirements phase and the modeling phase; it determines your loss function, evaluation metric, and architectural patterns. Production-grade systems must also account for silent degradation from data drift and concept drift, and acknowledging task-type-specific monitoring needs demonstrates the senior-level awareness interviewers look for. With the task type identified, the next step is decomposing the problem into functional requirements and non-functional requirements, including latency SLAs, throughput, fairness, and cost, that govern how the system must behave.