Search⌘ K
AI Features

Guardrails Outside the Prompt

Explore how to enforce critical constraints outside the system prompt by implementing programmatic guardrails in AI architectures. Understand pre-execution policy checks, post-execution result filtering, and confirmation flags for irreversible actions, ensuring reliable enforcement of business rules in agentic AI systems.

Every agent we have built so far relies on Claude to follow instructions in the system prompt. That works well for guidance, tone, formatting, and tool preference, but it is the wrong mechanism for rules that must hold without exception. In this lesson, we move critical constraints out of the prompt and into the dispatch function, where enforcement is deterministic. By the end of this lesson, we will be able to:

  • Explain why system prompt instructions are insufficient for critical business rules.

  • Implement a pre-execution policy check that intercepts a tool call before it runs.

  • Implement a post-execution check that filters or modifies a tool result before Claude reads it.

  • Apply the confirmation-flag pattern for irreversible actions.

Why the system prompt is not enough

Consider this system prompt line:

“Never apply a discount greater than 20%. Discounts above 20% require manager approval.”

Claude will follow these instructions most of the time. But system prompt compliance is a probability, not a guarantee. Under unusual phrasing, adversarial inputs, or edge cases that the developer did not test, Claude may reason its way past the constraint. The model does not interpret instructions as hard rules; it weighs them against context and produces the most likely response. For a business rule like a discount cap, “most of the time” is not a standard we can ship. A single 35%-off transaction that bypasses the prompt instruction costs real money and can potentially violate a compliance policy.

The correct fix is to move enforcement out of the prompt and into code. The dispatch_tool function we built in the previous chapter is the natural enforcement point: every tool call passes through it before the underlying function runs. Add a policy check there so it executes every time, ...