...

Evaluator Optimizer Pattern

Explore how the evaluator-optimizer pattern uses critique and iteration to produce high-quality, nuanced outputs.

We'll cover the following...

When to use this pattern?
How to build an agentic system that uses this pattern?
Wrap up

Till now, our agentic systems have been built like assembly lines or project plans. For example, the “parallelization” pattern was our efficiency expert, running independent tasks simultaneously to get a final result faster. On the other hand, the “orchestrator” was the savvy general contractor, creating a dynamic plan and delegating different sub-tasks to specialized workers to build a complex final product. In both these cases, the flow of information was primarily forward-moving, designed to move from a starting point to a finished deliverable.

The “evaluator-optimizer” pattern introduces a fundamentally different dynamic: the feedback loop. It’s less like a project plan and more like a craftsman working with a discerning critic. Instead of breaking a problem down into different parts, this pattern focuses on iteratively improving a single piece of work. One LLM (the craftsman) generates a draft, and a second LLM (the critic) evaluates whatever the first LLM generated, against a set of standards, providing specific, actionable feedback. The draft then goes back to the craftsman for revision, starting a cycle of continuous improvement.

Press + to interact

This shift from a “feed-forward” process to a “feedback loop” is another mechanism in agentic design. Our previous patterns were designed for execution and delegation. This pattern is designed for reflection and refinement. Now we are building a system that is explicitly designed to critique its own output and get progressively better. It’s a move from simply doing the work to actively improving the work, which is essential for tasks where quality, nuance, and precision are paramount.

When to use this pattern?

The evaluator-optimizer pattern isn’t for every task; it’s a specialist tool you pull out when quality, nuance, and precision are paramount. Think of it as the difference between writing a quick email and drafting a legal contract. For the email, a single pass is fine. For the contract, every word matters, and multiple rounds of review are essential. This pattern is for the “legal contract” kinds of problems, where a “good enough” first draft needs to be methodically refined into an “excellent” final version. The core requirement is that you can define what “excellent” means through a set of clear evaluation criteria.

The perfect time to use this pattern is when you can answer “yes” to two key questions:

“Could a smart human subject-matter expert significantly improve the LLM’s first draft by giving feedback?” If the task involves subtle context, stylistic nuance, or a high degree of completeness, the answer is likely yes. A human editor could easily spot a clumsy phrase in a translation or identify a missing perspective in a research summary. This is your first signal that a feedback loop would be valuable.
“Can you write a prompt that enables another LLM to act as that smart human expert?” It’s not enough to know that feedback would be helpful; the evaluator LLM must be capable of generating that feedback. This is where your “clear evaluation criteria” come into play.

For a literary translation, the evaluator’s prompt might instruct it to score the draft on “preserving the original ...

Introduction

Design Patterns

Case Studies

Wrap Up

Evaluator Optimizer Pattern

When to use this pattern?