Search⌘ K
AI Features

Managing the Browser State

Explore how to manage browser state in a multimodal web agent using Google ADK. Understand maintaining Playwright sessions, tracking element history, and bridging stateless LLM calls with a stateful browser to enable continuous, context-aware web interactions.

In the previous lesson, we observed the web agent seamlessly navigating, clicking, and extracting information. From the user's perspective, it looks like a single continuous session. However, under the hood, we face a fundamental architectural challenge: Large Language Models (LLMs) and their function-calling APIs are inherently stateless. Every time the LLM decides to use a tool (like click or type), it sends a standalone API request. If we were to start a new browser instance for every tool call, the agent would lose its place, its active tabs, and its login cookies.

To solve this, we must build a bridge between the stateless LLM and a highly stateful browser. In this lesson, we will explore src/agent.py to see how the system is launched, and then dive deep into src/tools/browser_runtime.py to understand how we keep the browser alive and track the agent's history across the ReAct loop.

The entry point: agent.py

Before we dive into complex state management, let's look at how the application actually starts. In Google ADK, agents are typically defined in their own modules and imported into a main entry point.

Our entry point is src/agent.py. It is intentionally minimal:

"""ADK discovery entrypoint for the `src` app."""
from src.adk_agent.web_agent import build_root_agent
root_agent = build_root_agent()
The ADK application entry point (src/agent.py)
  • Line 3: We import the factory function build_root_agent from our agent definition file (src/adk_agent/web_agent.py). ...