Managing the Browser State

Explore how to manage and preserve browser state in a multimodal web agent built with Google ADK. Understand the integration of Playwright components in maintaining session continuity across asynchronous LLM API calls. Learn to use dataclasses and singleton patterns that enable the agent to interact reliably with web pages through repeated observations and actions.

We'll cover the following...

The entry point: agent.py
The browser runtime: Keeping the session alive
Wrapping up

In the previous lesson, we observed the web agent seamlessly navigating, clicking, and extracting information. From the user's perspective, it looks like a single continuous session. However, under the hood, we face a fundamental architectural challenge: Large Language Models (LLMs) and their function-calling APIs are inherently stateless. Every time the LLM decides to use a tool (like click or type), it sends a standalone API request. If we were to start a new browser instance for every single tool call, the agent would lose its place, its active tabs, and its login cookies.

To solve this, we must build a bridge between the stateless LLM and a highly stateful browser. In this lesson, we will explore src/agent.py to see how the system is launched, and then dive deep into src/tools/browser_runtime.py to understand how we keep the browser alive and track the agent's history across the ReAct loop.

The entry point: `agent.py`

Before we dive into the complex state management, let's look at how the application actually starts. In Google ADK, agents are typically defined in their own modules and imported into a main entry point.

Our entry point is src/agent.py. It is intentionally minimal:

1.Agent Design Fundamentals

2.Multi-Agent Conversational Recommender System (MACRS)

Breakout Session

3.Nvidia Eureka Learning Agent

4.Implementing a Eureka-Like Reward Learning Agent with Google ADK

Breakout Session

5.Applying Agentic Design Principles

6.Designing an AI Agent for Generating LLM Pipelines

7. Designing a Web Agent

8.Implementing a Multimodal Web Agent with Google ADK

9.Designing a Multimodal-LLM Agent for Multi-Object Diffusion

10.Thought Exercise: AI Hospital

11.OpenClaw Design

12.Wrapping up

Mock Interview

13.Appendix: Free Reference Guides and Cheatsheets

Managing the Browser State

The entry point: `agent.py`

Managing the Browser State

The entry point: agent.py

The entry point: `agent.py`