Selection, Reflection, and Human Feedback

Explore the process of selecting the best reward function from multiple candidates using evidence-based rules. Understand how reflection agents analyze and provide structured feedback on reward performance. Learn how to incorporate optional human feedback to guide iterations and apply early-exit criteria to optimize computational resources. This lesson helps you manage iterative reward learning cycles to enhance autonomous agent behavior effectively.

We'll cover the following...

Selecting the best reward candidate
Reflecting on reward performance
Adding human feedback and stopping the loop
Summary

In the previous lesson, the system finished evaluating reward candidates. For the current iteration, we have multiple trained policies, quantitative metrics, rollout visualizations, and a structured summary in ctx.session.state["candidate_results"].

At this stage, the system must answer a simple but critical question:

Which reward should we carry forward?

That responsibility belongs to the SelectorAgent.

Selecting the best reward candidate

Reward selection is implemented in agents/selector_agent.py. This agent neither trains policies nor generates rewards. Its sole purpose is to:

Read evaluation results.
Apply a selection rule.
Update the shared state with the chosen “best” reward.

Start by opening the file and reviewing the agent’s dependencies. We’ll start with the imports:

1.Agent Design Fundamentals

2.Multi-Agent Conversational Recommender System (MACRS)

Breakout Session

3.Nvidia Eureka Learning Agent

4.Implementing a Eureka-Like Reward Learning Agent with Google ADK

Breakout Session

5.Applying Agentic Design Principles

6.Designing an AI Agent for Generating LLM Pipelines

7. Designing a Web Agent

8.Implementing a Multimodal Web Agent with Google ADK

9.Designing a Multimodal-LLM Agent for Multi-Object Diffusion

10.Thought Exercise: AI Hospital

11.OpenClaw Design

12.Wrapping up

Mock Interview

13.Appendix: Free Reference Guides and Cheatsheets

Selection, Reflection, and Human Feedback

Selecting the best reward candidate