Search⌘ K
AI Features

Selection, Reflection, and Human Feedback

Explore the key processes of reward selection, reflection, and human feedback integration in an AI reward-learning system. Understand how to implement a deterministic selector agent, analyze reward performance with reflection agents, utilize human feedback to improve results, and manage iteration loops to optimize learning and decision-making effectively.

In the previous lesson, the system finished evaluating reward candidates. For the current iteration, we have multiple trained policies, quantitative metrics, rollout visualizations, and a structured summary in ctx.session.state["candidate_results"].

At this stage, the system must answer a simple but critical question:

Which reward should we carry forward?

That responsibility belongs to SelectorAgent.

Selecting the best reward candidate

Reward selection is implemented in agents/selector_agent.py. This agent neither trains policies nor generates rewards. Its sole purpose is to:

  • Read evaluation results

  • Apply a selection rule

  • Update the shared state with the chosen “best” reward.

Start by opening the file and reviewing the agent’s dependencies. We’ll start with the imports:

import json
from loguru import logger
from typing import AsyncGenerator
from google.adk.agents import BaseAgent
from google.adk.agents.invocation_context import InvocationContext
from google.adk.events import Event

From these imports alone, we can infer the agent’s role:

  • It reads structured data (json).

  • It does not depend on training or environment tools.

  • It interacts only with shared state and ADK.

This is intentional. Selection should be lightweight and deterministic.

Next, we define the SelectorAgent:

class SelectorAgent(BaseAgent):
async def _run_async_impl(
self, ctx: InvocationContext
) -> AsyncGenerator[Event, None]:

As with previous agents, this is an ADK agent, executed inside the loop, driven entirely by shared state. By the time this agent runs, the evaluation for the iteration has already been completed.

The first thing the SelectorAgent does is retrieve evaluation results.

candidate_results_json = ctx.session.state["candidate_results"]
candidate_results = json.loads(candidate_results_json)

This is why we stored results as JSON in the previous lesson: it’s serializable, stable across agents, and easy to inspect or log.

At this point, candidate_results is a list of dictionaries, one per candidate, containing scores, metrics, artifact paths, and reward code.

Next, we select the candidate with the highest score.

best = max(candidate_results, key=lambda x: x["score"])

This line reflects a deliberate design choice: selection is purely evidence-based, with no heuristics, learned weighting, or LLM involvement. The selector trusts the evaluator’s scoring function and makes a clear, reproducible decision.

Once a winner is chosen, we extract what downstream agents need.

best_reward_code = best["reward_code"]
best_score = best["score"]
best_candidate_id = best["candidate"]

This is the minimal information required to inform reflection, seed the next iteration, and log progress.

Now we update the shared state so later agents can build on this decision.

ctx.session.state["best_reward_code"] = best_reward_code
ctx.session.state["best_score"] = best_score
ctx.session.state["best_candidate_id"] = best_candidate_id

From this point on, the system has a current best reward, reflection agents can analyze it, and the next iteration can improve upon it.

Finally, we log the outcome for inspection.

logger.info(
f"[SelectorAgent] Selected candidate {best_candidate_id} "
f"with score={best_score:.4f}"
)

This log entry becomes part of the execution trace we inspected in the first lesson of this chapter.

As with all ADK agents, we signal completion by yielding an event.

yield Event(author=self.name, content=None)

At this point, ...