Evaluating the Research Assistant

Explore the practical steps to evaluate a multi-agent Research Assistant built with Google ADK. Learn how to define test cases, set quality benchmarks, and assess your agent's logical process, final response quality, and factual accuracy. This lesson guides you to create evaluation configurations and execute tests that ensure your AI agent performs reliably and meets professional standards.

We'll cover the following...

Create the EvalSet file
Create the EvalConfig file
Run the evaluation
Analyze the results

For our specific single-turn, tool-using Research Assistant, not all criteria are equally relevant. User simulation, for instance, is designed for multi-turn conversations. Therefore, we will focus our hands-on evaluation on a curated but powerful set of criteria that directly measure the quality and correctness of our agent’s workflow:

tool_trajectory_avg_score: It verifies that our controller_agent delegates tasks to the worker agents in the correct logical order.
rubric_based_final_response_quality_v1: It checks if the agent’s final output meets a specific quality bar that we define.
hallucinations_v1: It ensures the agent’s synthesized report is factually grounded in the information it gathered from its tools.

Let’s begin creating the necessary configuration files for this purpose.

Create the

...

1.Getting Started

2.Foundations of Google ADK

3.Building the Research Assistant

4.Evaluation and Deployment

5.Orchestrating Workflows

6.Exercise

7.Wrap Up

Evaluating the Research Assistant

Create the