Main Components of A2A
Explore the central components of the Agent2Agent protocol to understand how agents introduce skills, communicate via messages, manage tasks, and produce artifacts. This lesson explains the client-server model, agent discovery through Agent Cards, and how these elements enable scalable and dynamic multi-agent workflows.
We'll cover the following...
When agents collaborate, everything depends on structure. Without a shared way to discover, communicate, and coordinate, even the most intelligent agents can talk past each other or work at cross-purposes. A2A solves this by providing every agent with a predictable way to introduce itself, exchange information, and manage work, regardless of who built it or what framework it runs on.
These three ideas: Agent Cards, communication patterns, and the task and message model form the backbone of the A2A architecture. Each plays a distinct role, but they provide agents with a common foundation to build upon. Once you understand these building blocks, everything from simple one-off calls to complex multi-agent workflows, starts to make sense.
We’ll break down the core architecture and walk through its main components.
How does A2A work?
Before proceeding, let’s review what happens in an A2A interaction once again to clarify any misunderstandings. As mentioned below, there are two primary roles.
A2A client (client agent): The agent or application that initiates a request on behalf of a user. This “orchestrator” agent determines what needs to be done and identifies which other agent has the necessary skills. It formulates tasks and sends requests using the A2A protocol.
A2A server (remote agent): The agent that exposes an HTTP endpoint implementing the A2A protocol (often running as a web service). It listens for incoming client requests, processes the task, and returns results or status updates. The remote agent can be considered an opaque service as the client doesn’t need to know how it works internally, only what capabilities it offers. This makes treating another agent as an opaque-box “service” or tool easy.
This architecture is analogous to a client-server model: any agent can act as a client to request help from another agent acting as a server. Notably, an agent can be both; for example, a coordinator agent might accept user requests (server role) and, in turn, call other agents to fulfill subtasks (client role).
In UML sequence diagrams, an “alt” box indicates that the enclosed interactions are conditional or optional; they may or may not happen depending on the circumstances.
To make this more tangible, think about what happens when you use ChatGPT, Gemini, or any other web-based chatbot with “thinking” models. While the model is reasoning before responding, suppose it could also scan a network of available agents, not just tools or APIs, as with the MCP client-server network. Instead, it could discover other full-fledged agents to find the best collaborator for each part of the task. A2A makes this possible.
It can discover all exposed Agent Cards, identify which agent best suits the job, and dynamically delegate. For example, if the model needs to generate a travel itinerary, it could call a flight agent for ticket options, a hotel agent for accommodation, and a policy agent for visa information. Each one is selected through A2A discovery, and not hardcoded integrations.
How do we define an agent’s skills and enable discoverability in A2A?
Before an agent can collaborate in the A2A ecosystem, it must introduce itself, both who it is and what it can do. This happens through a single document: the Agent Card. The Agent Card includes the agent’s metadata (identity, endpoint, capabilities) and the list of skills that describe what the agent can do.
What are agent “skills” in A2A?
An agent skill represents a specific capability or function that an agent can perform. It tells potential clients, “Here’s what I’m good at.” Each skill is defined using the AgentSkill type from the a2a.types class and includes several key attributes:
id: A unique identifier for the skill.name: A short, human-readable label.description: A clear explanation of what the skill does.tags: Keywords that help with discovery and categorization.examples: Example prompts or tasks that illustrate usage.inputModes/outputModes: The supported media types (for example,"text/plain","application/json").
Here’s a simple example from a “hello world” agent:
skill = AgentSkill(id='helloWorld',name='Returns hello world',description='Just returns hello world',tags=['hello world'],examples=['hi', 'hello world'],)
This is as minimal as it gets: a single text-based skill that simply returns “hello world.” Real agents might define multiple skills: one for querying data, one for summarizing text, one for checking compliance rules, and so on.
What are “Agent Cards” in A2A?
The Agent Card is where an agent makes their skills discoverable. It’s a small JSON document, typically served at .well-known/agent-card.json, so that any A2A client can fetch to learn what the agent does, where it’s hosted, and how to talk to it.
An AgentCard (also defined in a2a.types class) usually includes:
name,description,version: Basic identity and version information.url: The agent’s endpoint, where it listens for A2A requests.capabilities: Supported features such as streaming or push notifications.defaultInputModes/defaultOutputModes: Common media formats used.skills: The list ofAgentSkillobjects the agent exposes.
Here is a simple example:
public_agent_card = AgentCard(name='Hello World Agent',description='Just a hello world agent',url='http://localhost:9999/',version='1.0.0',protocolVersion='0.3.0',defaultInputModes=['text/plain'],defaultOutputModes=['text/plain'],capabilities=AgentCapabilities(streaming=True),skills=[skill],supportsAuthenticatedExtendedCard=True,)
This card shows that the agent runs at localhost:9999, supports text input and output, and has a single skill called “hello world.” The client doesn’t need to know how the agent handles requests internally; the card alone provides everything needed to discover and interact safely.
The Agent Card is the foundation of discoverability in A2A. When agents look for collaborators, they don’t query a hidden directory or rely on manual configuration; they simply fetch and parse agent cards. Each card gives them enough information to decide whether another agent can help with a task, what format to use, and how to authenticate if needed.
How do agents communicate and exchange work in A2A?
Once agents have introduced themselves through their Agent Cards, they need a way to actually talk, share data, and track progress. A2A does this through three key constructs:
Messages
Tasks
Artifacts
Together, they define how ideas travel, work happens, and results come back. Let’s take a look at each of them one by one.
What are messages in A2A?
A message represents a single turn in the conversation: one request or one response. Every message includes a few key pieces:
A
role("user"or"agent") to indicate who’s speaking.A
messageId(a unique identifier).One or more
Partobjects, which hold the actual content.
Parts are what make A2A flexible and modality-independent, meaning agents can exchange not only text, but also files, images, or structured data in the same protocol. There are three main kinds of parts:
TextPart: For plain text content.FilePart: For files, which can be transmitted inline (Base64 encoded) or via a URI, with metadata likefilenameandmimeType.DataPart: For structured JSON data, ideal for parameters, metadata, or machine-readable results.
This layered structure is what lets A2A support conversations that mix human-readable text and machine-actionable data seamlessly.
What are tasks in A2A?
Not every interaction ends in one message. Sometimes an agent needs time to run a computation, call an external service, or coordinate with others. That’s where “tasks” come in.
When a client sends a message, the receiving agent can respond in two ways, outlined below.
Respond with a stateless message: Used for quick, one-off exchanges. The response comes immediately, and the interaction ends.
Initiate a stateful task: Used for longer or more complex work. The agent responds with a
Taskobject that has ataskIdand a life cycle.
A task moves through states such as:
in-progress.input-requiredorauth-required(waiting on something).Terminal states like
completed,canceled, orfailed.
While a task is running, the agent can send updates, such as streaming partial outputs or progress notifications.
This lets clients track progress in real time or even provide new input mid-way.
Think of a task as a “long-lived message thread” as it keeps the conversation open until the job is done.
What are artifacts in A2A?
Finally, when a task completes, it usually produces something tangible: a result. That’s an artifact. An artifact is the final, structured output from a task. It might be:
A generated report (as a
FilePart).A summary in text.
A bundle of structured JSON data.
Like messages, an artifact has:
An
artifactId(unique identifier).A human-readable
name.One or more
Partobjects carrying the content.
Artifacts can also be
What’s next?
Together, Agent Cards, messages, tasks, and artifacts form the backbone of A2A’s collaboration model. Once you grasp how agents define their skills, communicate, and deliver work through these constructs, everything else in the protocol, from coordination patterns to streaming outputs, builds naturally on top.
In the coming lessons, we’ll combine everything and examine how the Agent Card works in more detail, including how to design and expose your own.