Feature Types and Engineering
Explore the core feature categories critical to machine learning systems, including user, item, cross, and contextual features. Understand feature engineering techniques and how to avoid training-serving skew by using feature stores and ensuring point-in-time correctness. This lesson teaches you how to structure feature design discussions clearly and effectively for ML system design interviews.
With training data properly balanced and evaluation metrics you can trust, the next design lever to pull is deciding what signals to extract from raw data. Feature engineering remains the single highest-leverage activity in production ML systems. At companies like YouTube, Airbnb, and Uber, the choice and quality of features consistently matter more than model architecture. A well-chosen set of features on a simple model will outperform a sophisticated architecture fed with noisy or incomplete signals.
Picture this interview scenario: you are designing a recommendation feed, and the interviewer says, “Walk me through the features you would use.” Without a framework, candidates tend to ramble, listing features in no particular order and missing entire categories. A structured taxonomy built around four categories (user, item, cross, and contextual features) gives you a repeatable scaffold to answer this question thoroughly and concisely. This lesson covers each category, its engineering techniques, and the critical pitfall of training-serving skew that separates production-ready designs from academic exercises.
User features
User features are signals derived from who the user is and what they have done. They form the foundation of personalization in any ranking or recommendation system. These features break down into four subgroups, each capturing a different facet of user identity and intent.
Demographics: Age bucket, country, language, and account age provide coarse segmentation. Privacy regulations often require coarsening strategies, such as bucketing age into ranges or mapping location to region-level granularity rather than exact coordinates.
Historical behavior: Aggregated engagement counts like clicks, purchases, and watch time computed over multiple time windows (7-day, 30-day, lifetime) capture both short-term intent and long-term preference. A user who watched three cooking videos today has a different short-term intent than their lifetime preference for tech content, and multiple windows let the model distinguish between the two.
Preference signals: Explicit ratings, saved items, and followed categories represent deliberate user choices. Explicit preferences are more reliable per signal than implicit ones (like clicks), but they are far sparser because most users rarely rate or save content.
Session context: Pages viewed in the current session, dwell time on the current page, and scroll depth capture real-time intent that historical aggregates miss entirely. A user who just searched for “running shoes” has an immediate intent that a 30-day purchase count cannot express.
User features are typically precomputed and stored in a feature store, with session features computed online at request time. The most common source of training-serving skew for user features is computing historical aggregates differently offline (via batch SQL) vs. online (via a streaming counter). Even small differences in window boundaries or late-arriving event handling cause the two values to silently diverge. ...