Assembling the Web Agent with Google ADK

Explore how to assemble a multimodal web agent with Google ADK by integrating browser session management, visual grounding tools, and orchestrating agent workflows. Understand operational modes including strict, single-agent, and multi-agent setups, and learn how to configure system prompts, loop agents, and fallback mechanisms to build reliable autonomous web systems.

We'll cover the following...

The agent architecture flow

In the previous lessons, we built a persistent browser session manager and a suite of multimodal web tools. However, tools alone cannot complete a task; they require an intelligent agent to orchestrate them.

In this final lesson, we dissect src/adk_agent/web_agent.py chunk by chunk. This file acts as the "brain" of the application. It resolves the LLM model, handles API versioning differences, and uses a factory function to build the root agent based on your environment configuration. Because Agentic Web Systems are complex, web_agent.py is designed to support three distinct operational modes:

Strict mode: A heavily constrained, single-agent loop optimized for precise 1-to-7 numeric action commands.
Single agent mode: A WebVoyager-style single agent using conversational tool calling.
Multi-agent workflow: A delegated hierarchy where a Coordinator manages a Planner, a Vision agent, and a Browser agent.

The agent architecture flow

Before reading the code, let's visualize the control flow. The factory function checks the environment variables and builds the corresponding agent structure, ultimately wrapping the active agents in a LoopAgent to ensure the ReAct loop runs continuously until the task finishes.

"""Google ADK multi-agent builder for a multimodal web agent."""
from __future__ import annotations
import os
from dotenv import load_dotenv
from google.adk.agents import Agent, LoopAgent
from src.tools.web_tools import (
    action_answer,
    action_back,
    action_click,
    action_google,
    action_input,
    action_scroll,
    action_wait,
    analyze_screenshot_with_vlm,
    capture_observation,
    capture_multimodal_observation,
    click_element,
    close_browser,
    log_step,
    navigate,
    go_back,
    go_google,
    reset_browser_task_state,
    select_flight_dates,
    search_web,
    scroll_by,
    type_text_element,
    verify_task_completion,
    wait,
)
def _create_agent(agent_cls, *, name: str, model: str, instruction: str, tools=None, sub_agents=None):
    """Create ADK agent while tolerating minor API differences between ADK versions."""
    kwargs = {
        "name": name,
        "model": model,
        "instruction": instruction,
    }
    if tools is not None:
        kwargs["tools"] = tools
    if sub_agents is not None:
        kwargs["sub_agents"] = sub_agents
    try:
        return agent_cls(**kwargs)
    except TypeError:
        # Fallback for ADK variants that do not support sub_agents in ctor.
        kwargs.pop("sub_agents", None)
        return agent_cls(**kwargs)
def _create_workflow_agent(agent_cls, *, name: str, sub_agents, **extra):
    """Create ADK workflow agent while tolerating ctor differences."""
    kwargs = {
        "name": name,
        "sub_agents": sub_agents,
        **extra,
    }
    try:
        return agent_cls(**kwargs)
    except TypeError:
        # Some variants may use children/agents instead of sub_agents.
        kwargs.pop("sub_agents", None)
        for field_name in ("agents", "children", "steps"):
            try:
                return agent_cls(name=name, **{field_name: sub_agents})
            except TypeError:
                continue
        raise
def _resolve_llm_model() -> str:
    """Resolve LLM model for ADK agents.
    Uses OpenAI only by default (reliable tool/function calling). Override with ``ADK_MODEL``
    if you need another LiteLLM-supported id (advanced).
    """
    explicit = os.getenv("ADK_MODEL", "").strip()
    if explicit:
        return explicit
    model = os.getenv("ADK_OPENAI_MODEL", "openai/gpt-5-mini").strip()
    return model or "openai/gpt-5-mini"

ADK compatibility wrappers, tool imports, and model resolution (web_ agent.py )

Lines 3–34: We import standard libraries (os), environment loaders (load_dotenv), ADK base classes (Agent, LoopAgent), and the exhaustive list of web tools we created previously in src/tools/web_tools.py.
Lines 37–54: _create_agent packs the name, model, and instruction into a kwargs dictionary. It conditionally adds tools and sub_agents to the arguments only if they are provided, keeping the instantiation clean. It attempts to instantiate the agent_cls with these arguments. If a TypeError occurs (often due to minor ADK version mismatches regarding sub-agent initialization), it pops sub_agents out of the dictionary and tries again as a safe fallback.
Lines 57–74: _create_workflow_agent prepares the arguments specifically for workflow agents (like loops). If the standard instantiation fails, it iterates through known alternative parameter names ("agents", "children", "steps") to ensure backward and forward compatibility with the ADK framework across different versions. ...

1.Agent Design Fundamentals

2.Multi-Agent Conversational Recommender System (MACRS)

Breakout Session

3.Nvidia Eureka Learning Agent

4.Implementing a Eureka-Like Reward Learning Agent with Google ADK

Breakout Session

5.Applying Agentic Design Principles

6.Designing an AI Agent for Generating LLM Pipelines

7. Designing a Web Agent

8.Implementing a Multimodal Web Agent with Google ADK

9.Designing a Multimodal-LLM Agent for Multi-Object Diffusion

10.Thought Exercise: AI Hospital

11.OpenClaw Design

12.Wrapping up

Mock Interview

13.Appendix: Free Reference Guides and Cheatsheets

Assembling the Web Agent with Google ADK

The agent architecture flow

Imports and helper functions