Guardrails Outside the Prompt
Explore how to enforce critical constraints outside the system prompt by implementing programmatic guardrails in AI architectures. Understand pre-execution policy checks, post-execution result filtering, and confirmation flags for irreversible actions, ensuring reliable enforcement of business rules in agentic AI systems.
We'll cover the following...
- Why the system prompt is not enough
- Pre-execution enforcement: Checking inputs before the tool runs
- Post-execution enforcement: Filtering results before Claude reads them
- The confirmation flag pattern for irreversible actions
- When to use prompt guidance vs. programmatic enforcement
- Exercise: Choose the enforcement point
- What’s next?
Every agent we have built so far relies on Claude to follow instructions in the system prompt. That works well for guidance, tone, formatting, and tool preference, but it is the wrong mechanism for rules that must hold without exception. In this lesson, we move critical constraints out of the prompt and into the dispatch function, where enforcement is deterministic. By the end of this lesson, we will be able to:
Explain why system prompt instructions are insufficient for critical business rules.
Implement a pre-execution policy check that intercepts a tool call before it runs.
Implement a post-execution check that filters or modifies a tool result before Claude reads it.
Apply the confirmation-flag pattern for irreversible actions.
Why the system prompt is not enough
Consider this system prompt line:
“Never apply a discount greater than 20%. Discounts above 20% require manager approval.”
Claude will follow these instructions most of the time. But system prompt compliance is a probability, not a guarantee. Under unusual phrasing, adversarial inputs, or edge cases that the developer did not test, Claude may reason its way past the constraint. The model does not interpret instructions as hard rules; it weighs them against context and produces the most likely response. For a business rule like a discount cap, “most of the time” is not a standard we can ship. A single 35%-off transaction that bypasses the prompt instruction costs real money and can potentially violate a compliance policy.
The correct fix is to move enforcement out of the prompt and into code. The dispatch_tool function we built in the previous chapter is the natural enforcement point: every tool call passes through it before the underlying function runs. Add a policy check there so it executes every time, ...