Managing the Browser State
Explore how to manage browser state in a multimodal web agent using Google ADK. Understand maintaining Playwright sessions, tracking element history, and bridging stateless LLM calls with a stateful browser to enable continuous, context-aware web interactions.
In the previous lesson, we observed the web agent seamlessly navigating, clicking, and extracting information. From the user's perspective, it looks like a single continuous session. However, under the hood, we face a fundamental architectural challenge: Large Language Models (LLMs) and their function-calling APIs are inherently stateless. Every time the LLM decides to use a tool (like click or type), it sends a standalone API request. If we were to start a new browser instance for every tool call, the agent would lose its place, its active tabs, and its login cookies.
To solve this, we must build a bridge between the stateless LLM and a highly stateful browser. In this lesson, we will explore src/agent.py to see how the system is launched, and then dive deep into src/tools/browser_runtime.py to understand how we keep the browser alive and track the agent's history across the ReAct loop.
The entry point: agent.py
Before we dive into complex state management, let's look at how the application actually starts. In Google ADK, agents are typically defined in their own modules and imported into a main entry point.
Our entry point is src/agent.py. It is intentionally minimal:
"""ADK discovery entrypoint for the `src` app."""from src.adk_agent.web_agent import build_root_agentroot_agent = build_root_agent()
Line 3: We import the factory function
build_root_agentfrom our agent definition file (src/adk_agent/web_agent.py). ...