Search⌘ K
AI Features

Video Recommendation: Data Strategy & Feature Engineering

Explore how to transform user behavior into meaningful features for video recommendation systems. Understand implicit satisfaction signals like watch time, replays, skips, and shares, and learn to engineer user, video, and contextual features. Discover strategies to address the cold start problem with content embeddings and manage the operational challenges of embedding indexing at scale.

In a YouTube-scale recommendation system, fewer than 1% of users ever click a thumbs-up or thumbs-down button. The system serves billions of recommendation requests daily, yet it operates in a near-vacuum of explicit preference data. This forces a fundamental design choice: the model must learn what users want not from what they say, but from what they do. With business metrics mapped to ML objectives and scale constraints defined in the previous lesson, the next critical decision is what data to collect and how to transform raw behavioral traces into features the model can actually learn from.

This lesson covers three pillars that interviewers consistently probe. First, how implicit behavioral signals serve as satisfaction proxies and the noise they introduce. Second, how features are engineered across user, video, and contextual dimensions with different freshness requirements. Third, how the cold start problem for new videos is solved using content embeddings, and the operational reality of embedding billions of videos, including the painful index rebuild challenge when the embedding model changes.

Interviewers expect you to reason about why certain signals are chosen, what biases they carry, and how feature pipelines operate under latency constraints at billion-item scale. Candidates who treat data strategy as an afterthought consistently underperform.

Implicit signals as satisfaction proxies

Explicit feedback like star ratings and thumbs-up clicks suffers from two problems. It is extremely sparse, because most users never bother, and it is biased toward strong opinions, capturing only love-it-or-hate-it reactions while missing the vast middle ground. This makes explicit signals insufficient as the sole training target for a recommendation model that must serve every user on every request.

The system instead relies on four core implicit signals, each capturing a different facet of user satisfaction.

  • Watch time (normalized by duration): This is the strongest general engagement signal. However, raw watch time biases the model toward recommending longer videos. Normalizing by video duration fixes this. A user watching 95% of a 30-second clip signals stronger satisfaction than watching 10% of a 2-hour movie. The normalized ratio, sometimes called completion rate, becomes the actual feature.

  • Replays: When a user re-watches a video, it is a high-confidence positive signal indicating the content was valuable enough to revisit. Replays are rare, which makes them sparse but extremely informative when they occur.

  • Skips: A skip, especially within the first few seconds, signals poor relevance or a misleading thumbnail. Early skips carry stronger negative weight than ...