Welcome to the course. Before we get into agents, tool use, and agent workflows, let’s clarify what you’re preparing for, why it matters, and how the course is structured. Anthropic, the company behind Claude, offers the Claude Certified Architect certifications. These certifications recognize developers and architects who can design, build, and reason about production-grade systems with Claude using the Claude API, the Claude Agent SDK, Model Context Protocol, and Claude Code.

The certification tests apply architectural judgment by focusing on choices that hold up beyond a demo. These tests focus on the decision-making required to move from prototype to production. Exam scenarios put you in the engineer’s role, asking what you should build and why your design choices would hold up under production constraints such as reliability, latency, cost, and maintainability.

Note: At the time of writing, the certifications are available at no cost for the first 5,000 enrollees from Anthropic partner companies. Check Anthropic’s official certification page for current availability and pricing. The exam is delivered as a scenario-based, multiple-choice assessment.

Exam at a glance

The exam presents realistic engineering situations, such as a customer support agent that needs a refund cap, a multi-agent research system with conflicting data, and a CI/CD pipeline that needs structured output. Each scenario asks you to make the right architectural call. Every question has one clearly correct answer and three plausible-but-wrong distractors.

#	Domain	Weight	What It Tests
1	Agentic Architecture and Orchestration	~25%	Agent SDK loops, multi-agent systems, hooks, session management, task decomposition
2	Tool Design and MCP Integration	~20%	Tool descriptions, structured error responses, MCP configuration, built-in tools
3	Claude Code Configuration and Workflows	~20%	`CLAUDE.md` hierarchy, custom commands and skills, plan mode, CI/CD integration
4	Prompt Engineering and Structured Output	~20%	Explicit criteria, few-shot prompting, JSON schema design, validation-retry loops
5	Context Management and Reliability	~15%	Context degradation, escalation patterns, information provenance, human review

What the exam actually tests

The exam is designed to catch two types of wrong answers: answers that are logically appealing but architecturally flawed, and answers that reflect prototype-level thinking applied to production problems. Here’s an example. Suppose an agent needs to stop iterating when it’s done with a task. You have four options:

A) Parse the assistant’s text for phrases like “task complete.”
B) Set a maximum iteration limit of 10.
C) Check the stop_reason field: continue on tool_use, exit on end_turn.
D) Monitor conversation length and stop after a set number of messages.

Options A, B, and D all seem reasonable at first glance. Option A is what most people try first. But only C is correct: it’s the only approach that reads a reliable, machine-generated signal rather than guessing from natural language or arbitrary counts.

This is the pattern the exam repeats across all five domains: programmatic, deterministic approaches beat probabilistic, heuristic ones every time. Every domain has a set of these common mistakes. They feel reasonable, they often work in small demos, but they fail under production pressure. We call them anti-patterns, and recognizing them quickly is one of the most important skills this course builds. Here’s a preview of the ones we’ll encounter across the course:

Anti-Pattern	The Right Approach
Parse natural language to stop an agent loop	Check `stop_reason` field
Enforce business rules with a system prompt	Use programmatic hooks
Escalate based on negative sentiment	Escalate on policy gaps and explicit requests
Return empty results when a tool fails	Return a structured error with `isError: true`
Summarize context repeatedly	Preserve critical facts in an immutable block
Give one agent 18 tools	Distribute 4–5 tools across specialized subagents
Self-review generated code in the same session	Use a fresh, isolated session for review
Trust `tool_use` output as semantically correct	Validate semantics separately after structural compliance

Each chapter ends with scenario-based quiz questions that mirror the exam format: one clearly correct answer, three plausible distractors, and a detailed explanation of why each answer is right or wrong.

What we will build

Rather than just studying concepts, we build real artifacts throughout the course. By the end, we will have created:

An annotated agent transcript showing how stop_reason, tool calls, and tool results interact
A working agentic loop with logging and stop-condition handling
A guardrailed tool workflow with programmatic enforcement outside the prompt
A coordinator-subagent architecture with focused context handoffs
A well-designed tool specification set with structured error surfaces
A prompt rubric and JSON schema for a real extraction task
A validation-retry loop with specific error feedback
A CLAUDE.md instruction hierarchy and reusable workflow draft
A capstone architecture sketch connecting all patterns across a complete scenario

Throughout this course, we frame every concept around two checks: what approach fits the scenario, and why does it work? What do weaker approaches have in common, and why do they fail? Because scenarios can vary, the best preparation is broad coverage with deep understanding. By the time we reach the capstone, the strongest answer should feel well supported. That intuition comes from understanding the system deeply enough that weaker approaches are easier to rule out.

Quick knowledge check

A junior engineer on your team is building a customer support agent. The product manager asks: “How do we make sure the agent never approves refunds above $500 without manager sign-off?” The engineer proposes adding the following line to the system prompt:

“Important: Never approve refunds above $500. Always escalate to a manager for approval.”

Select the correct answer!

1.

What is the fundamental problem with this approach?

A.

The instruction is ambiguous: “Never” and “Always” are absolute terms that confuse the model.

B.

The system prompt is in the wrong configuration layer; this rule should go in a CLAUDE.md file.

C.

Prompt-based instructions are probabilistic: the model may occasionally comply with a $700 refund request despite the. instruction

D.

The $500 threshold is a magic number and should be stored in a configuration file, not the prompt.

1 / 1

Detail	Value
Issuer	Anthropic
Passing score	Check official exam portal/current blueprint
Format	Scenario-based assessment, according to available prep materials; confirm in the official portal
Scenarios on the exam	Check official exam portal/current blueprint
Cost	Check official certification page or Partner Academy portal

#	Scenario	Primary Domains Tested
1	Customer Support Resolution Agent	D1, D2, D5
2	Code Generation with Claude Code	D3, D4
3	Multi-Agent Research System	D1, D5
4	Developer Productivity with Claude	D2
5	Claude Code for CI/CD	D3, D4
6	Structured Data Extraction	D4, D5

1.Claude AI Systems Foundations

2.Building Agents with the Claude Client SDK

3.Architecting Agentic Systems

4.Orchestrating Multi-Agent Systems

5.Designing Tools and MCP Integrations

6.Prompting and Schema Design

7.Claude Code Configuration and Project Workflows

8.Validation, Retry Loops, and Metrics

9.Context Management Techniques

10.Making Reliable Claude Systems

Introduction to Claude Certification

Exam at a glance

The five exam domains

The six exam scenarios

What the exam actually tests

How this course is structured

What we will build

Quick knowledge check