Multi-Task and Multi-Objective Learning

Explore multi-task and multi-objective learning techniques essential for designing machine learning systems that optimize multiple goals simultaneously. Understand shared-bottom and mixture-of-experts architectures, their trade-offs, and strategies to manage task interference. Learn how to combine multi-task outputs effectively for production ranking systems.

We'll cover the following...

Shared-bottom architecture
Mixture-of-experts for heterogeneous tasks
- The MMoE architecture
  - Sparse activation and expert routing
Task interference and conflicting gradients
- How gradient conflicts arise
- Detection and mitigation strategies
Production MTL at Meta and Google
Summary

Consider this ML system design prompt: Design a news feed ranking system that must optimize for click-through rate, conversion rate, and engagement time at the same time. The word “simultaneously” signals that a single-objective model is unlikely to capture the full ranking problem. Training one model per objective creates redundant feature extraction, higher serving cost, and separate learned representations, which can make downstream score fusion harder to calibrate. This is precisely why companies like Meta, Google, and YouTube default to a single multi-task model. Multi-task learning (MTL) is an architectural strategy where a shared representation backbone feeds multiple task-specific prediction heads, amortizing compute while capturing cross-task signal. The shared parameters improve data efficiency because supervision from one task regularizes the representation for others. Parameter sharing introduces a key design trade-off in multi-objective ranking systems: when objectives pull the model in different directions, updates from one objective can degrade performance on another.

The following diagram illustrates the two dominant MTL architectures you will encounter in interviews and production systems:

With this visual as a reference, let’s walk through each architecture in detail.

Shared-bottom architecture

The shared-bottom design is the simplest production MTL pattern. A common trunk, typically a deep MLP or transformer encoder, processes raw input features into a shared embedding vector. This embedding then branches into task-specific tower networks, where each tower produces a prediction for one objective, such as $P(\text{click})$ , $P(\text{conversion})$ , or expected watch time.

Several properties make this the default starting point for most teams:

Single forward pass efficiency: The shared trunk runs once per candidate item, so serving latency scales with the number of lightweight tower heads rather than full model replicas.
Cross-task regularization: Supervision from the click task implicitly regularizes features that the conversion task also needs, improving sample efficiency on sparse labels like purchases.
Implementation simplicity: Adding a new objective requires only appending a new tower head and loss term, with no changes to the shared trunk. ...

Attention: When tasks are heterogeneous or conflicting, such as clickbait content that maximizes CTR but destroys long-term engagement, the shared trunk is forced into a compromise. This phenomenon, called

1.The Interview Framework and Communication

2.Problem Formulation and Requirements

3.Data Strategy: Collection, Pipelines, and Features

4.Model Design and Architecture Selection

5.Evaluation: Offline, Online, and Fairness

6.Serving, Deployment, and MLOps

7.Case Study: Video Recommendation System

8.Case Study: Social Feed Ranking System

9.Case Study: Ad Click-Through Rate Prediction System

Mock Interview

10.Case Study: Semantic Search Engine

11.Case Study: Content Moderation System

Mock Interview

12.Case Study: Object Detection System

Mock Interview

13.Case Study: Visual Search System

Mock Interview

14.Case Study: Fraud Detection System

Mock Interview

15.Case Study: RAG-Based Enterprise Knowledge Assistant

16.Case Study: LLM-Powered Code Generation Tool

Multi-Task and Multi-Objective Learning

Shared-bottom architecture