Reward Generation and Evaluation Loop

Explore the process of implementing a Eureka-like reward learning system for AI agents. Learn to generate multiple reward candidate codes via LLMs, evaluate and validate them safely, and train policies to measure reward effectiveness. By completing this lesson, you will understand how to turn reward ideas into actionable evidence and prepare for iterative improvements in agentic systems.

We'll cover the following...

Implementing reward generation
Implementing candidate evaluation
Summary

Implementing reward generation

Start with the first step in each loop iteration: reward generation. In this system, the RewardDesigner has a single job:

It builds a prompt that includes the task spec, environment code, and feedback from the previous iteration.
It calls an OpenAI model to generate K reward candidates.
It stores the raw generated text in the shared state so the evaluator can parse it next.

Everything we do below supports that flow.

Setting up the OpenAI client and prompt constants

Let's start at the top of agents/llm_agents.py. Before defining the agent class, we set up the OpenAI client and a few constants that control output formatting.

Here's what we're doing (as implementers):

We initialize client once at import time so the agent can reuse it for every iteration.
CANDIDATE_DELIM is a parsing contract. The evaluator relies on this exact delimiter to split candidates.
DESIGNER_SYSTEM narrows model behavior. We don't want explanations, Markdown, or tips. We want a reward code.

This is the first place where you see an important design pattern:

We enforce reliability by making the LLM output machine-parseable, not human-friendly.

Building the reward generation prompt

Next, we define _designer_prompt(...). This function is where we program the reward designer's behavior.

1.Agent Design Fundamentals

2.Multi-Agent Conversational Recommender System (MACRS)

Breakout Session

3.Nvidia Eureka Learning Agent

4.Implementing a Eureka-Like Reward Learning Agent with Google ADK

Breakout Session

5.Applying Agentic Design Principles

6.Designing an AI Agent for Generating LLM Pipelines

7. Designing a Web Agent

8.Implementing a Multimodal Web Agent with Google ADK

9.Designing a Multimodal-LLM Agent for Multi-Object Diffusion

10.Thought Exercise: AI Hospital

11.OpenClaw Design

12.Wrapping up

Mock Interview

13.Appendix: Free Reference Guides and Cheatsheets

Reward Generation and Evaluation Loop

Implementing reward generation

Setting up the OpenAI client and prompt constants

Building the reward generation prompt

Prompt mode: Improving an existing best reward