Visual Search: Serving & Indexing

Explore how to design the serving layer of a billion-item visual search system with a 50 ms latency target. Understand the trade-offs in choosing ANN index technologies like FAISS and ScaNN, implement scalable sharding strategies, perform embedding versioning with zero-downtime rebuilds, and apply hybrid NSFW filtering to balance safety and latency.

We'll cover the following...

ANN indexing with FAISS and ScaNN
- FAISS IVF+PQ
- ScaNN and HNSW alternatives
Sharding and serving architecture
- Sharding strategies
Embedding versioning and zero-downtime rebuilds
- Blue-green index rebuild strategy
NSFW filtering in the serving pipeline
Designing for the next stage

A billion-item product catalog sits behind your visual search system. Each item is a 512-dimensional CLIP embedding. A user uploads a photo, and your system has 50 milliseconds to return the most visually similar products. The previous lesson established the CLIP encoder configuration, the re-ranking pipeline, and precomputed attribute labels that define the embedding space and latency constraints. Those constraints now become hard requirements for the serving layer you must design.

This is the core interview challenge for visual search system design. Four design decisions determine whether the system meets its SLO or collapses under load. You need to select an ANN index technology (FAISS vs. ScaNN), design a sharding strategy for horizontal scalability, implement embedding versioning with zero-downtime index rebuilds, and place NSFW filtering in the serving pipeline without blowing the latency budget. Systems like Pinterest Visual Search and Google Lens operate at exactly this scale, and interviewers expect you to reason through each decision with concrete trade-offs.

ANN indexing with FAISS and ScaNN

Brute-force exact nearest neighbor search computes the distance between a query vector and every vector in the index. For a billion 512-dimensional vectors, that is $O(n \cdot d)$ per query, roughly 512 billion floating-point operations. This is completely infeasible for a 50 ms SLO.

Approximate nearest neighbor (ANN) search is the production solution for every billion-scale retrieval system.

FAISS IVF+PQ

FAISS (Facebook AI Similarity Search) provides several index ...

1.The Interview Framework and Communication

2.Problem Formulation and Requirements

3.Data Strategy: Collection, Pipelines, and Features

4.Model Design and Architecture Selection

5.Evaluation: Offline, Online, and Fairness

6.Serving, Deployment, and MLOps

7.Case Study: Video Recommendation System

8.Case Study: Social Feed Ranking System

9.Case Study: Ad Click-Through Rate Prediction System

Mock Interview

10.Case Study: Semantic Search Engine

11.Case Study: Content Moderation System

Mock Interview

12.Case Study: Object Detection System

Mock Interview

13.Case Study: Visual Search System

Mock Interview

14.Case Study: Fraud Detection System

Mock Interview

15.Case Study: RAG-Based Enterprise Knowledge Assistant

16.Case Study: LLM-Powered Code Generation Tool

Visual Search: Serving & Indexing

ANN indexing with FAISS and ScaNN

FAISS IVF+PQ