Social Feed Ranking: Data Strategy & Feature Engineering
Explore how to design a data strategy for social feed ranking systems by leveraging social graph structure, interaction history, and content metadata. Understand building near-real-time feature pipelines, engineering cross features between users, and addressing privacy constraints to optimize feed relevance and freshness.
Every ranking objective is only as good as the features that represent it. The previous lesson established the business metrics that drive a social feed, including engagement, meaningful connections, and creator equity, along with the hybrid fan-out architecture that balances write-time and read-time computation. Now the question shifts from what we optimize to what data makes that optimization possible. For social feeds, the social graph is the single most information-dense data source available to the ranking system. Unlike item catalog features in e-commerce, graph features encode human relationships, and that makes them both extraordinarily predictive and uniquely sensitive.
Interviewers at L5 and above expect you to articulate which raw data sources map to which business objectives before jumping into feature lists. This lesson covers three data pillars: social graph structure, interaction history, and content metadata. It then architects a near-real-time feature pipeline for feed freshness, engineers cross features between poster and viewer, and addresses the privacy constraints that govern what graph traversals are permissible. The features designed here flow directly into the multi-task model architecture covered in the next lesson.
Social graph features and interaction history
The
Interaction history adds a temporal layer on top of this static topology. Likes, comments, reshares, DMs, profile visits, and story views between a viewer-poster pair are aggregated over sliding windows, typically 1 day, 7 days, and 28 days. This transforms a binary “connected or not connected” edge into a continuous relationship-strength signal. Facebook’s meaningful social interactions shift, for example, relied heavily on comment-thread depth and reply chains as stronger engagement indicators than passive likes.
Interaction history also detects decaying relationships. Two users may still be connected but no longer interact, and the ranking system should down-rank content from such posters. Storing per-edge interaction counters for billions of edges requires careful aggregation strategies and TTL-based expiration policies to keep storage costs manageable.
Attention:...