From Design to a Running Multimodal Web Agent

Explore the transition from designing to running a multimodal web agent using Google ADK. Understand how the agent observes, acts, and navigates real websites, inspect project structure, and verify the agent's behavior by analyzing run artifacts and logs.

We'll cover the following...

Demo: Watch the agent complete a real task
- What you should notice while watching
- Typical phases
Project structure
Agent output files after running a web task
- Conclusion

In the previous chapter, we designed the architecture of a multimodal web agent: how it observes pages, chooses actions, and stays grounded while interacting with the web. But architecture diagrams alone are not enough. To understand whether a design actually works, we need to study the implementation, run the agent on real websites, and inspect what happens step by step.

This lesson is the starting point for that transition from design to code. Before diving into individual functions and components, we will build a practical mental map of the project. You will watch the agent perform a real task, explore the main folders and files in the repository, and learn which outputs to inspect after a run for debugging and verification.

By the end of this lesson, you should be able to:

Describe, at a high level, what happens during a demo run: observations, tool use, and final response.
Read the project tree and connect each major path to its specific responsibility.
List the files you would open after a run to replay what happened.

Pause and reflect: Before you continue, write one sentence: What would you want to see in a log file to convince yourself the agent "really looked" at the page before clicking?

Demo: Watch the agent complete a real task

To build a mental model, we will begin with one fixed task prompt so we can consistently compare the agent's intent and execution.

Task prompt used for the demo: Go to LinkedIn, find the company "Google," and tell me the most recent post they made. Extract the complete post content and the URL. ...

1.Agent Design Fundamentals

2.Multi-Agent Conversational Recommender System (MACRS)

Breakout Session

3.Nvidia Eureka Learning Agent

4.Implementing a Eureka-Like Reward Learning Agent with Google ADK

Breakout Session

5.Applying Agentic Design Principles

6.Designing an AI Agent for Generating LLM Pipelines

7. Designing a Web Agent

8.Implementing a Multimodal Web Agent with Google ADK

9.Designing a Multimodal-LLM Agent for Multi-Object Diffusion

10.Thought Exercise: AI Hospital

11.OpenClaw Design

12.Wrapping up

Mock Interview

13.Appendix: Free Reference Guides and Cheatsheets

From Design to a Running Multimodal Web Agent

Demo: Watch the agent complete a real task