Training Data Generation
Explore how training data for dynamic pricing engines is generated from logged pricing decisions, contextual factors, and observed outcomes. Understand the complexities of selection bias, counterfactual gaps, and delayed signals. This lesson helps you grasp how models learn from decision-conditioned data rather than traditional supervised labels, preparing you to design and evaluate pricing systems effectively.
We'll cover the following...
- Pricing as a logged decision-making process
- Pointwise training data for pricing
- The counterfactual gap in pricing data
- Selection bias and historical pricing policies
- Exploration and experimental data
- Synthetic data and cold-start products
- Data validation and business sanity checks
- Interview questions and answers
Dynamic pricing systems don’t learn prices in isolation; they learn from past decisions made under real-world constraints and observe the resulting outcomes. Each training row encodes a historical pricing decision influenced by inventory, promotions, competitor behavior, regional rules, and risk tolerance. Training data is thus an active record of business logic, not just raw numbers.
Unlike classical supervised learning, pricing outcomes are contextual and conditional. A purchase at $20 doesn’t mean $20 was “correct”; a non-purchase doesn’t automatically indicate a price was too high. Models must interpret data considering timing, intent, stock levels, and applied policies.
Fun fact: Some large e-commerce platforms spend more engineering effort on pricing data logging and validation than on model development itself, because once bad pricing data is learned, models can amplify errors at scale.
Historical prices often reflect human decisions, rules, or earlier models, creating selection bias. Without careful handling, models simply replicate past policies rather than discovering optimal pricing. Strong candidates in interviews highlight the importance of understanding how pricing data is generated, what constraints it encodes, and why naïve assumptions are dangerous.
Pricing as a logged decision-making process
At its core, dynamic pricing is not a prediction problem; it is a decision-learning problem. Every row of training data exists because someone or something chose a price. Unlike traditional supervised learning, where labels exist independently of the model, pricing labels are created by decisions. This makes pricing fundamentally different from tasks like image classification or spam detection.
Each training example corresponds to a logged pricing decision made by a human operator, a rules engine, or a previous model. The data is not a passive observation of reality; it is the record of an action taken under uncertainty. This is why pricing data must always be interpreted as decision-conditioned evidence, not ground truth.
Fun fact: Many real-world pricing models are trained using supervised learning, but their data structure is identical to reinforcement learning logs: (state, action, reward), even if teams don’t explicitly call it that.
Every pricing decision can be decomposed into three essential components:
Context represents the observable state of the world at the moment the price was set. This includes factors such as inventory levels, time of day, seasonality, competitor prices, user segment, device type, and active promotions. Context defines what information was available at the time the decision was made. If relevant context is missing or logged incorrectly, the model will infer spurious relationships.
Action is the price that was actually chosen. Importantly, this is just one option among many possible prices. The model does not observe alternative actions that could have been taken. This single-action logging is the root cause of counterfactual uncertainty in pricing systems.
Outcome ...