Debugging and Observability
Explore how to diagnose and resolve common failures in LangGraph workflows by using state inspection, checkpoint history, and LangSmith observability tools. Understand how to systematically track execution paths, distinguish failure categories, and apply effective debugging strategies to build reliable and maintainable AI agents with graph orchestration.
We'll cover the following...
- The four categories of graph failure
- Three debugging tools
- What we are building
- State design
- Build it step by step
- Debugging approach 1: state inspection
- Debugging approach 2: checkpoint history
- Debugging approach 3: LangSmith
- Fixing the bug
- A systematic debugging checklist
- Complete executable code
- Exercise
- Solution
- Terms introduced in this lesson
In a traditional Python function, a failure produces a traceback that points to the exact line. You fix the line and move on. In a graph workflow, the failure is rarely that direct.
A routing function sends execution to the wrong branch, but the error shows up two nodes later when a handler tries to read a field that was never written. A model returns an action label with a capitalised first letter instead of lowercase, but the failure appears as an unrecognised node name in a routing function. A node silently writes an empty string to a field, and the downstream quality check passes incorrectly because the check was only verifying that the field existed, not that it had meaningful content.
The challenge is not that LangGraph hides failures. It is that the failure’s cause and the failure’s symptom are often separated by one or more nodes. To debug effectively, we need to see the execution path the graph actually took, not just the final state.
The four categories of graph failure
Most problems in LangGraph workflows fall into one of four categories. Knowing which category a failure belongs to narrows the search considerably. The table below outlines the four categories of graph workflow failure, describing what can go wrong in each category and how those failures typically appear during execution.
Category | What fails | How it presents |
Routing failure | A routing function returns an unexpected or wrong node name |
|
State failure | A node reads a field that was never written, or reads a stale value |
|
Model output failure | The model returns content in an unexpected format or with unexpected values | JSON parse error, failed validation, or routing to the fallback path more often than expected |
Tool failure | A tool node receives bad input or the external call fails | Exception from the tool function, empty |
Identifying the category first tells us where to look. Routing failures live in the routing function and the classifier node. State failures live in the state schema and the node that was supposed to write the missing field. Model output failures live in the extraction node and the prompt. Tool failures live in the tool function and the node that calls it.
Three debugging tools
We have three debugging tools available in LangGraph, in order from least setup to most:
State inspection requires nothing beyond reading the dict returned by
app.invoke. It tells us the final state of every field. It does not tell us which nodes ran in which order, or what each field’s value was at intermediate steps.Checkpoint history requires the graph to be compiled with a checkpointer. Calling
app.get_state_history(config)returns every saved snapshot in reverse chronological order — one per node execution. Each snapshot shows the complete state after that node ran. This is the most powerful built-in debugging tool.LangSmith is an external observability platform for LangChain and LangGraph workflows. Once configured with two environment variables, every invocation is automatically traced. LangSmith shows each node as a timed step, displays inputs and outputs for every step, surfaces errors with full context, and lets us compare runs side by side. It requires a LangSmith account, but the setup is two lines.
What we are building
We will ...