Search⌘ K
AI Features

Loop Control, Exit Conditions, and System Behavior

Explore how the components of a Eureka-like reward learning AI agent system fit together, focusing on loop control, iteration flow, and exit conditions. Learn to navigate system execution, run the workflow on Google Colab, interpret outputs, and evaluate progress through logs and visualizations. Gain practical insights into managing and debugging continuous reward evolution loops.

In previous lessons, we implemented each component of the Eureka-like system in isolation:

  • Set up and environment initialization

  • Reward generation and evaluation

  • Selection and reflection

  • Feedback and exit control

  • Iteration state management

In this lesson, we step back and look at how the components fit together. The goal is not to introduce new agents or tools. Instead, we’ll trace where execution starts, how control flows across agents, how iterations repeat, and how outputs accumulate over time.

Where the full system is assembled

In our implementation, there are two assembly points:

  • agent.py → the workflow entrypoint ADK runs

  • reward_loop.py → the loop factory that builds the iterative reward evolution loop

Before diving into code, it’s important to read these files with the right mindset. These files answer questions like:

  • Which agent runs first?

  • Which agents repeat?

  • What happens when the exit condition is met?

It does not answer how rewards are generated, how PPO works, or how scoring is computed. All of that logic lives elsewhere.

Entrypoint: agent.py

When we run the project with ADK, execution begins from the file below:

"""ADK entrypoint for `adk run app/eureka_loop`"""
from google.adk.agents import SequentialAgent
from .agents.setup_agent import SetupAgent
from .agents.reward_loop import build_reward_loop
root_agent = SequentialAgent(
name="EurekaRewardEvolutionPipeline",
sub_agents=[
SetupAgent(name="SetupAgent"),
build_reward_loop(),
],
description="Initializes env/task state and runs an Eureka-like reward evolution loop."
)

Let’s read this in two steps:

Step A: The orchestration primitive is SequentialAgent

from google.adk.agents import SequentialAgent

We import SequentialAgent because the top-level workflow is strictly ordered. There’s no branching here. The flow is linear: run setup, then enter the loop.

Step B: root_agent is the runnable pipeline ADK executes

root_agent = SequentialAgent(
name="EurekaRewardEvolutionPipeline",
sub_agents=[
SetupAgent(name="SetupAgent"),
build_reward_loop(),
],
...
)

This sub_agents=[ ... ] list is the entire top-level execution plan:

  1. Run SetupAgent once.

  2. Then, run whatever build_reward_loop() returns (the iterative loop). ... ...