Visual Search: Serving & Indexing
Explore how to design the serving layer of a billion-item visual search system with a 50 ms latency target. Understand the trade-offs in choosing ANN index technologies like FAISS and ScaNN, implement scalable sharding strategies, perform embedding versioning with zero-downtime rebuilds, and apply hybrid NSFW filtering to balance safety and latency.
A billion-item product catalog sits behind your visual search system. Each item is a 512-dimensional CLIP embedding. A user uploads a photo, and your system has 50 milliseconds to return the most visually similar products. The previous lesson established the CLIP encoder configuration, the re-ranking pipeline, and precomputed attribute labels that define the embedding space and latency constraints. Those constraints now become hard requirements for the serving layer you must design.
This is the core interview challenge for visual search system design. Four design decisions determine whether the system meets its SLO or collapses under load. You need to select an ANN index technology (FAISS vs. ScaNN), design a sharding strategy for horizontal scalability, implement embedding versioning with zero-downtime index rebuilds, and place NSFW filtering in the serving pipeline without blowing the latency budget. Systems like Pinterest Visual Search and Google Lens operate at exactly this scale, and interviewers expect you to reason through each decision with concrete trade-offs.
ANN indexing with FAISS and ScaNN
Brute-force exact nearest neighbor search computes the distance between a query vector and every vector in the index. For a billion 512-dimensional vectors, that is
Approximate nearest neighbor (ANN) search is the production solution for every billion-scale retrieval system.
FAISS IVF+PQ
FAISS (Facebook AI Similarity Search) provides several index ...