Feature Store Architecture, Engineering, and Governance
Explore how to use Amazon SageMaker Feature Store to prevent training-serving skew through a unified dual-store architecture. Understand the roles of online and offline stores, feature groups, and batch versus streaming ingestion. Learn governance strategies like metadata management, TTL policies, access control, and PII isolation to maintain production-grade feature pipelines and ensure secure, compliant ML workflows.
Imagine a loan approval model that works well in validation but starts rejecting qualified applicants in production. The culprit is training-serving skew. During training, a credit utilization ratio was computed using end-of-month statements from a data warehouse. In production, the same feature is pulled from a real-time API that reflects current balances. A customer who pays off their card mid-cycle suddenly appears to have zero utilization, a pattern the model never learned to interpret. This is a training-serving skew, and a feature store exists to eliminate it.
Training-serving skew occurs when features computed during training differ from those served at inference time because of inconsistent pipelines. Feature stores solve this by providing a centralized, single source of truth for feature definitions, computation logic, and retrieval. Rather than maintaining separate code paths for training and inference, both consumers read from the same managed store with identical feature values.
Amazon SageMaker Feature Store is the AWS-managed service purpose-built for this problem. Its dual-store architecture, an online store for low-latency inference and an offline store for batch training, ensures that identically computed features flow to both paths. It integrates natively with SageMaker Pipelines for orchestrated ingestion and with Model Registry for lineage tracking from features to deployed models.
In this lesson, we will cover the full architecture:
Online vs. offline stores
Feature groups and lifecycle management
Batch vs. streaming ingestion trade-offs
Governance practices (metadata, TTL, access control, and PII isolation) that distinguish production-grade feature management from ad hoc pipelines
Online store: Low-latency serving
The online store is a fully managed, low-latency key-value store optimized for real-time inference. SageMaker endpoints retrieve the latest feature values by record identifier (for example, customer_id) with single-digit millisecond latency. It stores only the most recent feature vector per record, making it ideal for augmenting inference requests with precomputed features.
When creating a feature group, we can choose between two storage tiers: Standard (default) or InMemory. The Standard tier provides fast data retrieval for typical ML serving workloads. The InMemory tier, powered by Amazon ElastiCache (Redis OSS), supports even lower-latency retrieval for high-throughput applications and supports collection types such as lists, sets, and vectors. However, the InMemory tier has ...