Visual Search: Problem Framing & Requirements

Explore how to frame visual search system design problems accurately by distinguishing between image-to-image and image-to-product search. Understand key business metrics such as click-through rate, purchase conversion, and zero-result rate, and how they influence system design. Learn to manage billion-scale indexing within strict latency budgets and plan for operational risks like embedding drift. This lesson guides you to make precise scoping decisions essential for building scalable, efficient visual search systems.

We'll cover the following...

Two problem formulations
- Image-to-image search
- Image-to-product search
Business metrics that drive design decisions
Scale requirements and latency budget
- Billion-scale indexing
- Latency budget decomposition
  - Embedding drift as an operational risk
L4, L5, and Staff+ scoping comparison
Setting up the data and embedding layer

Every time you open Pinterest and snap a photo of a lamp you like, or point Google Lens at a pair of sneakers on the street, a visual search system converts your photo into a mathematical representation, scans billions of indexed images, and returns relevant results, all before you finish blinking. This pipeline, spanning embedding generation, approximate nearest neighbor retrieval, multi-modal ranking, and strict latency enforcement, is exactly why interviewers at MAANG companies reach for visual search as a system design prompt. It tests breadth and depth simultaneously.

The core question you will face sounds deceptively simple: “Design a system where a user uploads a photo and receives visually similar or shoppable results in under 200 ms across a billion-image index.” Answering it well requires precise problem framing before any architecture diagram appears on the whiteboard.

This lesson walks through two distinct problem formulations, the business metrics that guide both offline and online evaluation, the scale and latency constraints that eliminate naive solutions, and a leveling comparison that reveals how scoping depth separates an L4 answer from a Staff+ answer. These framing decisions cascade into every downstream choice you will make in subsequent lessons.

Two problem formulations

A visual search query always starts with an image, but what the system returns, and how it is judged, depends entirely on which problem you are solving. Conflating the two formulations is one of the most common mistakes candidates make, and it leads to architectures that look reasonable on the surface but fail the business objective.

Image-to-image search

Pinterest Lens in discovery mode and Google Lens in explore mode both implement image-to-image search. The system retrieves images that are perceptually similar to the query. Relevance is judged by visual coherence: does the result share color palette, texture, composition, or scene structure with the query? The embedding model is trained with a contrastive visual loss that pulls visually similar pairs together in embedding space and pushes dissimilar pairs apart. The index corpus consists of web-crawled images spanning every visual category.

Image-to-product search

Amazon StyleSnap and Google Lens in shopping mode implement image-to-product search. The system retrieves purchasable catalog items that match the object depicted in the query photo. Relevance is judged by whether the user can ...

1.The Interview Framework and Communication

2.Problem Formulation and Requirements

3.Data Strategy: Collection, Pipelines, and Features

4.Model Design and Architecture Selection

5.Evaluation: Offline, Online, and Fairness

6.Serving, Deployment, and MLOps

7.Case Study: Video Recommendation System

8.Case Study: Social Feed Ranking System

9.Case Study: Ad Click-Through Rate Prediction System

Mock Interview

10.Case Study: Semantic Search Engine

11.Case Study: Content Moderation System

Mock Interview

12.Case Study: Object Detection System

Mock Interview

13.Case Study: Visual Search System

Mock Interview

14.Case Study: Fraud Detection System

Mock Interview

15.Case Study: RAG-Based Enterprise Knowledge Assistant

16.Case Study: LLM-Powered Code Generation Tool

Visual Search: Problem Framing & Requirements

Two problem formulations

Image-to-image search

Image-to-product search