Agent Design, Code Structure, and Output Demonstration

Understand how to design and orchestrate a reward learning agent inspired by NVIDIA's EUREKA system. Learn the iterative process of generating, evaluating, selecting, and reflecting reward functions using Google ADK and Brax environments, and observe the system’s outputs through rollout visualizations.

We'll cover the following...

Before we begin: A few key terms
Adapting the Eureka design for practical exploration
The Eureka workflow as an agentic loop
- Why reward design needs a loop
- One iteration, step by step
Project structure
Inspecting the execution logs
Summary

In the previous chapter, we studied NVIDIA’s EUREKA system, an agentic framework that uses large language models to automatically design and iteratively refine reward functions for reinforcement learning. While the original system operates in large-scale robotic environments and physics simulations tightly coupled to NVIDIA’s hardware, reproducing those conditions exactly in an instructional setting is impractical.

For a hands-on demonstration, we will instead implement an Eureka-like reward learning agent using:

Lightweight Brax environments, specifically HalfCheetah
Google’s Agent Development Kit (ADK) for agent orchestration
Free T4 GPU resources on Google Colab

The goal is not to reproduce NVIDIA’s system at full scale, but to reimplement the core design principles behind EUREKA in a form that is computationally tractable, easier to reason about, and suitable for controlled experimentation.

In this lesson, we will focus on agent design and workflow, code structure, and observing the system’s outputs. Specifically, we will examine:

The agents involved in our Eureka-like system
The role and responsibility of each agent
The inputs and outputs flowing through the system
The orchestration pattern used to manage the iterative reward evolution loop
The overall project structure

Finally, we will run the complete system once and inspect its outputs, trained policies, generated reward functions, and rollout visualizations, to see the design in action before diving into the implementation details in the next lessons.

Before we begin: A few key terms

Before we implement the system, we’ll define a few reinforcement learning terms that we’ll use throughout the chapter. You don’t need a deep RL background, just enough intuition to follow the mechanics.

Policy

A policy is the decision-making component of a reinforcement learning agent. Given the current state of the environment, the policy decides what action to take next. In our case, policy controls the HalfCheetah robot, deciding how each joint should move at every step. When we say “training a policy,” we mean optimizing this decision-making function so that the agent behaves better according to a reward signal.

Reward function

A reward function assigns a numerical score to the agent’s behavior at each step. For example:

Moving forward might give a positive reward.
Falling over might give a negative reward.
Wasting energy might incur a penalty.

Designing this function is difficult, and that is exactly the problem EUREKA is trying to solve.

Rollout

A rollout is a recorded ...

1.Agent Design Fundamentals

2.Multi-Agent Conversational Recommender System (MACRS)

Breakout Session

3.Nvidia Eureka Learning Agent

4.Implementing a Eureka-Like Reward Learning Agent with Google ADK

Breakout Session

5.Applying Agentic Design Principles

6.Designing an AI Agent for Generating LLM Pipelines

7. Designing a Web Agent

8.Designing a Multimodal-LLM Agent for Multi-Object Diffusion

9.Thought Exercise: AI Hospital

10.OpenClaw Design

11.Wrapping up

Mock Interview

Agent Design, Code Structure, and Output Demonstration

Before we begin: A few key terms

Policy

Reward function

Rollout