Search⌘ K
AI Features

Training Data Generation

Explore how training data for dynamic pricing engines is generated from logged pricing decisions, contextual factors, and observed outcomes. Understand the complexities of selection bias, counterfactual gaps, and delayed signals. This lesson helps you grasp how models learn from decision-conditioned data rather than traditional supervised labels, preparing you to design and evaluate pricing systems effectively.

Dynamic pricing systems don’t learn prices in isolation; they learn from past decisions made under real-world constraints and observe the resulting outcomes. Each training row encodes a historical pricing decision influenced by inventory, promotions, competitor behavior, regional rules, and risk tolerance. Training data is thus an active record of business logic, not just raw numbers.

Unlike classical supervised learning, pricing outcomes are contextual and conditional. A purchase at $20 doesn’t mean $20 was “correct”; a non-purchase doesn’t automatically indicate a price was too high. Models must interpret data considering timing, intent, stock levels, and applied policies.

Fun fact: Some large e-commerce platforms spend more engineering effort on pricing data logging and validation than on model development itself, because once bad pricing data is learned, models can amplify errors at scale.

Historical prices often reflect human decisions, rules, or earlier models, creating selection bias. Without careful handling, models simply replicate past policies rather than discovering optimal pricing. Strong candidates in interviews highlight the importance of understanding how pricing data is generated, what constraints it encodes, and why naïve assumptions are dangerous.

From past actions andcustomer responses to future intelligence
From past actions andcustomer responses to future intelligence

Pricing as a logged decision-making process

At its core, dynamic pricing is not a prediction problem; it is a decision-learning problem. Every row of training data exists because someone or something chose a price. Unlike traditional supervised learning, where labels exist independently of the model, pricing labels are created by decisions. This makes pricing fundamentally different from tasks like image classification or spam detection.

Each training example corresponds to a logged pricing decision made by a human operator, a rules engine, or a previous model. The data is not a passive observation of reality; it is the record of an action taken under uncertainty. This is why pricing data must always be interpreted as decision-conditioned evidence, not ground truth.

Fun fact: Many real-world pricing models are trained using supervised learning, but their data structure is identical to reinforcement learning logs: (state, action, reward), even if teams don’t explicitly call it that.

Every pricing decision can be decomposed into three essential components:

Pricing decision main components
Pricing decision main components
  • Context represents the observable state of the world at the moment the price was set. This includes factors such as inventory levels, time of day, seasonality, competitor prices, user segment, device type, and active promotions. Context defines what information was available at the time the decision was made. If relevant context is missing or logged incorrectly, the model will infer spurious relationships.

  • Action is the price that was actually chosen. Importantly, this is just one option among many possible prices. The model does not observe alternative actions that could have been taken. This single-action logging is the root cause of counterfactual uncertainty in pricing systems.

  • Outcome ...