Neural Architectures for Ranking and Retrieval

Understand and apply dominant neural architectures for ranking and retrieval in machine learning systems. Learn how two-tower models efficiently retrieve candidates from billions, while Wide & Deep and DCN architectures rank shortlisted items effectively. Gain insights into their deployment, training strategies, and trade-offs to design scalable recommendation and search systems.

We'll cover the following...

Two-tower models for candidate retrieval
- Architecture and training
- Deployment for ANN search
Ranking architectures for scoring candidates
- Wide & Deep architecture
- Deep and Cross Network for automated feature interactions
  - DCN-v2 and practical considerations
Sketching these architectures in interviews
Connecting ranking architectures to the broader system

Logistic regression and gradient-boosted decision trees provide strong baselines for tabular ranking problems, but they hit a ceiling when a system must retrieve relevant items from a corpus of millions or billions of candidates. These classical models operate on hand-crafted features and cannot learn the dense, semantic representations needed to match users to items at a massive scale. Every major recommendation and search system at MAANG companies addresses this through a two-stage paradigm. A retrieval stage narrows billions of candidates down to hundreds using lightweight models, and a ranking stage applies expressive models to score and reorder that shortlist. This lesson covers the three dominant neural architectures interviewers expect you to diagram and justify: two-tower models for retrieval, Wide & Deep for ranking, and Deep & Cross Network (DCN) for ranking.

Consider this interview prompt: “Design a video recommendation system that serves 2 billion users with sub-200 ms request latency.” The architectures in this lesson give you the core building blocks for that design. You need a retrieval model fast enough to retrieve candidates from a billion-scale video index within the latency budget and a ranking model expressive enough to score shortlisted candidates using CTR and other engagement or quality signals. The following sections explain each architecture and where it fits in the retrieval-to-ranking pipeline.

Two-tower models for candidate retrieval

The two-tower model, also called a dual encoder or bi-encoder, is the industry standard for the retrieval stage. The core idea is straightforward: train two separate neural networks, one for users and one for items, so that each produces a dense embedding vector of the same dimensionality.

Architecture and training

The user tower takes in user features such as watch history, demographics, and contextual signals like time of day, passes them through an embedding layer and two to three fully connected layers, and outputs a fixed-size user embedding vector. The item tower follows the same structure but ingests item features like item ID, title embedding, category, and popularity signals.

During training, the model maximizes similarity (via dot product or cosine) between positive user-item pairs while pushing apart negatives. The loss function is typically sampled softmaxA training loss that approximates the full softmax over all items by sampling a subset of negatives, making training tractable when the item corpus contains millions of entries. or in-batch negatives, where other items in the same mini-batch serve as negative examples.

Attention: If the two towers share no parameters and are trained with only random negatives, the embedding space degrades, and retrieval recall drops significantly. Hard negative mining, which uses items that are similar but not relevant, is essential for production-quality two-tower models.

...

1.The Interview Framework and Communication

2.Problem Formulation and Requirements

3.Data Strategy: Collection, Pipelines, and Features

4.Model Design and Architecture Selection

5.Evaluation: Offline, Online, and Fairness

6.Serving, Deployment, and MLOps

7.Case Study: Video Recommendation System

8.Case Study: Social Feed Ranking System

9.Case Study: Ad Click-Through Rate Prediction System

Mock Interview

10.Case Study: Semantic Search Engine

11.Case Study: Content Moderation System

Mock Interview

12.Case Study: Object Detection System

Mock Interview

13.Case Study: Visual Search System

Mock Interview

14.Case Study: Fraud Detection System

Mock Interview

15.Case Study: RAG-Based Enterprise Knowledge Assistant

16.Case Study: LLM-Powered Code Generation Tool

Neural Architectures for Ranking and Retrieval

Two-tower models for candidate retrieval

Architecture and training