Search⌘ K
AI Features

Managing the Browser State

Explore how to manage and preserve browser state in a multimodal web agent built with Google ADK. Understand the integration of Playwright components in maintaining session continuity across asynchronous LLM API calls. Learn to use dataclasses and singleton patterns that enable the agent to interact reliably with web pages through repeated observations and actions.

In the previous lesson, we observed the web agent seamlessly navigating, clicking, and extracting information. From the user's perspective, it looks like a single continuous session. However, under the hood, we face a fundamental architectural challenge: Large Language Models (LLMs) and their function-calling APIs are inherently stateless. Every time the LLM decides to use a tool (like click or type), it sends a standalone API request. If we were to start a new browser instance for every single tool call, the agent would lose its place, its active tabs, and its login cookies.

To solve this, we must build a bridge between the stateless LLM and a highly stateful browser. In this lesson, we will explore src/agent.py to see how the system is launched, and then dive deep into src/tools/browser_runtime.py to understand how we keep the browser alive and track the agent's history across the ReAct loop.

The entry point: agent.py

Before we dive into the complex state management, let's look at how the application actually starts. In Google ADK, agents are typically defined in their own modules and imported into a main entry point.

Our entry point is src/agent.py. It is intentionally minimal:

"""ADK discovery entrypoint for the `src` app."""
from src.adk_agent.web_agent import build_root_agent
root_agent = build_root_agent()
The ADK application entry point (src/agent.py)
  • Line 3: We import the factory function build_root_agent from our agent definition file (src/adk_agent/web_agent.py). ...