Multi-Task and Multi-Objective Learning
Explore multi-task and multi-objective learning techniques essential for designing machine learning systems that optimize multiple goals simultaneously. Understand shared-bottom and mixture-of-experts architectures, their trade-offs, and strategies to manage task interference. Learn how to combine multi-task outputs effectively for production ranking systems.
Consider this ML system design prompt: Design a news feed ranking system that must optimize for click-through rate, conversion rate, and engagement time at the same time. The word “simultaneously” signals that a single-objective model is unlikely to capture the full ranking problem. Training one model per objective creates redundant feature extraction, higher serving cost, and separate learned representations, which can make downstream score fusion harder to calibrate. This is precisely why companies like Meta, Google, and YouTube default to a single multi-task model. Multi-task learning (MTL) is an architectural strategy where a shared representation backbone feeds multiple task-specific prediction heads, amortizing compute while capturing cross-task signal. The shared parameters improve data efficiency because supervision from one task regularizes the representation for others. Parameter sharing introduces a key design trade-off in multi-objective ranking systems: when objectives pull the model in different directions, updates from one objective can degrade performance on another.
The following diagram illustrates the two dominant MTL architectures you will encounter in interviews and production systems:
With this visual as a reference, let’s walk through each architecture in detail.
Shared-bottom architecture
The shared-bottom design is the simplest production MTL pattern. A common trunk, typically a deep MLP or transformer encoder, processes raw input features into a shared embedding vector. This embedding then branches into task-specific tower networks, where each tower produces a prediction for one objective, such as
Several properties make this the default starting point for most teams:
Single forward pass efficiency: The shared trunk runs once per candidate item, so serving latency scales with the number of lightweight tower heads rather than full model replicas.
Cross-task regularization: Supervision from the click task implicitly regularizes features that the conversion task also needs, improving sample efficiency on sparse labels like purchases.
Implementation simplicity: Adding a new objective requires only appending a new tower head and loss term, with no changes to the shared trunk. ...
Attention: When tasks are heterogeneous or conflicting, such as clickbait content that maximizes CTR but destroys long-term engagement, the shared trunk is forced into a compromise. This phenomenon, called