Why LangGraph Exists
Understand why LangGraph was created to overcome the limitations of sequential AI workflows by adopting a graph structure with explicit nodes, edges, and shared state. Learn how this approach helps handle complex assistant tasks with routing, retries, and state tracking, making workflows clearer and more maintainable as requirements grow. This lesson guides you through designing a customer support assistant using LangGraph, and the mindset shift from chains to graphs it demands.
Imagine you’re building a customer support assistant for a software company. You spend an afternoon wiring it up: take the user’s message, pass it to a language model with a good system prompt, return the answer. You test it. It handles common questions well. You feel good about it.
Then the requirements start growing:
“If the user’s question is vague, ask a clarifying question before answering.” You add an
ifstatement around the model call. Fine.“If the question is about billing or legal topics, don’t answer directly, so we flag it for human review.” You add another conditional. Getting messier.
“Sometimes the model gives a weak answer. Can we detect that and retry?” You add a retry counter, a quality check, and a second model call. Now it’s getting hard to read.
“We need to log which path each request took, and resume from the last good step if anything fails.”
Now your clean afternoon project is a nest of nested conditionals, manual state dictionaries, fragile string comparisons, and retry logic duct-taped on top of everything else. It’s not really a “support assistant” anymore. It’s a workflow engine you accidentally built by hand.
This is the exact problem LangGraph solves. It gives us a proper way to design these workflows as graphs, with explicit nodes, edges, and shared state, so the complexity stays manageable as requirements grow.
What LangChain does well
LangChain is a mature library for connecting language models to data and tools. It gives us model wrappers, prompt templates, output parsers, retrievers, and pre-built chains for common tasks. With a handful of lines, we can build a retrieval-augmented generation pipeline that fetches relevant documents and passes them to a model for a grounded answer.
This is genuinely useful. For a large class of applications, a document Q&A tool, a summarization service, a chatbot with a system prompt, a LangChain chain is everything we need. The input arrives, the steps run in order, and the output returns. No branching. No loops. No state to carry between calls.
The pattern works well when every request follows the same path.
Where LangChain starts to strain
A linear chain assumes that every input goes through the same sequence of steps. That assumption holds until the workflow needs to make a decision.
In real assistant workflows, different requests need different handling. A simple question should get a direct answer. A vague request needs clarification before anything else happens. A sensitive billing question should go to a human reviewer. None of those fit cleanly into a straight line.
When we try to handle this with LangChain alone, we end up encoding the routing logic inside the steps themselves, either as long prompt instructions or as conditional Python woven through every function. The result is code where the routing, the logic, and the state management are all tangled together.
The table below shows how each new requirement pushes a LangChain pipeline past what it handles cleanly.
Requirement | What We End Up Doing in a Chain | The Problem |
Route vague questions to clarification | Add if before the model call | Routing logic bleeds into response logic |
Block sensitive topics for human review | Add more conditionals | Control flow is buried in function bodies |
Retry low-quality answers | Wrap model call in a while loop | Retry logic is mixed with generation logic |
Track which path a request took | Build a custom log dict | Manual state management outside any structure |
Resume a failed run from last good step | There is no easy way | Durability requires a complete rewrite |
Every one of these is solvable in plain Python. But as we add more requirements, the codebase becomes the kind of thing only we can debug, and only on a good day. LangChain is not broken; it is simply not designed for this class of problem.
Thinking in graphs
LangGraph replaces this pattern with an explicit graph structure. Instead of one long function with nested logic, we define three things:
Nodes: Individual units of work, each doing one clearly named job.
Edges: Connections between nodes that define what runs next.
State: A shared data object that all nodes read from and write to.
With this structure, routing logic lives in edges, not buried in functions. State is managed by the framework, not tracked manually. Each node is focused and testable on its own.
Here is the same customer support workflow drawn as a graph. The route labels on the arrows show the conditions that control which path execution takes.
The branching is explicit and visible. When something goes wrong, we know exactly which node failed, what state it received, and which path execution was on.
The mindset shift
Working with LangGraph asks us to think differently. In a linear pipeline, we think: “Call these steps in order.” In a graph, we think: “What decisions does this workflow need to make, and where?”
That shift forces us to answer questions upfront that a linear pipeline lets us ignore:
What are the distinct branches this workflow can take?
What data needs to travel between nodes?
Where do we want control to stop and wait?
What should happen if a step fails?
These questions exist in any serious assistant workflow. LangGraph just makes us answer them up front, in code, where they are explicit and auditable, rather than discovered through broken production behavior.
Key idea: In a pipeline, control flow is implicit: it is just the order you wrote the code. In a graph, control flow is a first-class design decision. You define it deliberately and LangGraph enforces it.
Setup
We need LangGraph, an LLM provider, and a virtual environment. You need it if you are doing a local setup. In the course, this setup is already configured on our platform.
python3 -m venv .venvsource .venv/bin/activatepip install langgraph groq
Line 1: Creates an isolated Python environment for this project.
Line 2: Activates the environment so all installs stay local.
Line 3: Installs LangGraph and the Groq Python library for using LLM. Groq gives us free access to
llama-3.1-8b-instant, the model we use in the answer node.
Set your LLM provider API key as an environment variable:
export GROQ_API_KEY="your-api-key-here"
Line 1: Makes the key available to the script at runtime. Since Groq currently offers free access to Llama models, we will be using their infrastructure for this course.
Build it step by step
We will construct the workflow in four stages: define the state, write the nodes, assemble the graph, and run it.
Defining the state
State is the first thing we design. It is the data contract between all nodes in our workflow. Every field that any node might read or write must be declared here.
For our support workflow, we need three pieces of information to move between nodes: what the user said, which route the classifier chose, and what the final response text should be.
from typing import TypedDictclass SupportState(TypedDict):user_input: strroute: strresponse_text: str
Line 1: Imports
TypedDictfrom the standard library to define a structured schema.Line 3–6: Declares three fields.
user_inputcomes from the caller.routeis written by the classifier and read by the routing function.response_textis written by whichever branch node executes.
We only put fields in state that need to cross a node boundary. Anything a node calculates just for its own internal use stays as a local variable. This keeps the state focused and easy to reason about.
Writing the classifier node
The classifier reads the user’s message and decides which branch should handle it. It does not call any model. This decision is made with simple keyword matching for now, which is fast, free, and easy to test.
def classify_request(state: SupportState) -> dict:text = state["user_input"].lower()if "refund" in text or "legal" in text:return {"route": "approval"}if "unclear" in text or "maybe" in text:return {"route": "clarify"}return {"route": "answer"}
Line 1: Node signature: takes state, returns a partial state update as a plain dict.
Line 2: Lowercases input so matching is case-insensitive.
Line 3–4: Routes sensitive topics to the approval gate.
Line 5–6: Routes vague input to the clarification path.
Line 7: Everything else gets routed to the main answer branch.
Notice that this function only writes route. It does not touch response_text because that is not its job. Each node updates only the fields it owns.
Writing the branch nodes
Each route has a corresponding node that handles requests on that path. The clarification node handles vague input by asking the user for more detail before doing anything else:
def ask_followup(state: SupportState) -> dict:return {"response_text": ("I want to make sure I help you accurately. ""Could you give me one more detail about what you need?")}
Line 1–7: Returns a fixed clarification prompt. No model call needed. A clear, deterministic message is better here than a generated one.
The approval gate handles sensitive requests by routing them to a review queue rather than answering directly:
def approval_gate(state: SupportState) -> dict:return {"response_text": ("Your request involves a topic that requires a team member to review. ""We will follow up with you shortly.")}
Line 1–7: Returns a safe holding message without attempting to answer. This protects the workflow from generating incorrect policy or legal guidance automatically.
The answer node is the only one that calls an LLM, and only for requests that have already been classified as general enough to handle:
from groq import Groqdef draft_answer(state: SupportState) -> dict:api_key = "{{GROQ_API_KEY}}"client = Groq(api_key=api_key)prompt = ("You are a helpful customer support assistant.\n""Answer the following question clearly and in 3-5 lines.\n\n"f"Customer question: {state['user_input']}")response = client.chat.completions.create(model="llama-3.1-8b-instant",messages=[{"role": "user", "content": prompt}],)return {"response_text": response.choices[0].message.content}
Line 1: Imports the Groq SDK.
Lines 3–5: Defines the node, reads the course API key placeholder, and creates the Groq client.
Lines 6–10: Builds a prompt that grounds the model in the support assistant role and the user’s actual question.
Lines 11–15: Calls
chat.completions.createwithllama-3.1-8b-instantand returns the reply asresponse_textin the state update.
Building and compiling the graph
With the state and nodes defined, we assemble the graph. This is where LangGraph’s StateGraph, START, END, and add_conditional_edges come together.
from typing import Literalfrom langgraph.graph import END, START, StateGraphdef choose_next_step(state: SupportState) -> Literal["ask_followup", "draft_answer", "approval_gate"]:route_map = {"clarify": "ask_followup","approval": "approval_gate","answer": "draft_answer",}return route_map.get(state["route"], "draft_answer")builder = StateGraph(SupportState)builder.add_node("classify_request", classify_request)builder.add_node("ask_followup", ask_followup)builder.add_node("draft_answer", draft_answer)builder.add_node("approval_gate", approval_gate)builder.add_edge(START, "classify_request")builder.add_conditional_edges("classify_request", choose_next_step)builder.add_edge("ask_followup", END)builder.add_edge("draft_answer", END)builder.add_edge("approval_gate", END)app = builder.compile()
Line 1–2: Imports the graph primitives we need.
Line 4–10: Defines the routing function. It receives state, reads the
routefield that the classifier wrote, and returns the name of the next node to execute. TheLiteralreturn type declares the exact set of valid node names, which LangGraph uses to validate the graph structure.Line 12–16: Creates the graph with our state schema and registers all four nodes. The first argument is the node name used in edges; the second is the Python function.
Line 18: Connects the graph entry point. When we call
app.invoke(...), execution starts atclassify_request.Line 19: Adds a conditional edge from
classify_request. After that node runs, LangGraph callschoose_next_stepand routes to whichever node name it returns.Line 20–22: Connects each terminal branch to
END. Every execution path must reachENDor the graph will hang.Line 24: Compiles the builder into a runnable application object. Compilation validates the graph: it checks that every node is reachable, every edge is valid, and every conditional edge returns a known node name.
Running the workflow
We invoke the compiled graph with an initial state dictionary. Every field in our schema must have an initial value.
result = app.invoke({"user_input": "What is the refund process for annual plans?","route": "","response_text": "",})print("Route taken:", result["route"])print("Response:", result["response_text"])
Line 1–7: Calls the graph with initial values.
routeandresponse_textstart empty because those fields will be written by nodes during execution.Line 9–10: Reads the final state. After execution,
routeholds the classifier’s decision andresponse_textholds the branch output.
Design decisions worth understanding
These choices may seem small, but they play an important role in making the workflow reliable, transparent, and easy to extend.
Why separate the classifier from the routing function? The classifier (
classify_request) is a node: it writes to state. The routing function (choose_next_step) is a pure function that reads state and returns a node name. Keeping them separate means the classification result is stored in state and is available for inspection, logging, or future use by other nodes. If we had embedded routing directly inside the classifier, we would lose that record.Why use keyword matching instead of an LLM for classification? For this first workflow, keyword matching is intentional. It is fast, free, deterministic, and easy to test. LLM-based classification makes sense when the routing logic is complex or nuanced enough to require it. Using a model for simple keyword detection adds latency and cost without benefit. We will add model-based classification in a later lesson when the routing genuinely needs it.
What happens if
routecontains an unexpected value? Theroute_map.get(state["route"], "draft_answer")line handles this with a default fallback todraft_answer. In production, we would also want to log unexpected route values so we can improve the classifier over time.
Common mistakes
Two patterns trip up most people building their first LangGraph workflow. Both are easy to avoid once you know what to look for.
Not initialising all state fields. When invoking the graph, every field in the schema needs a value, even if that value is an empty string or zero. Passing a dictionary that is missing fields causes a runtime error.
# This will fail — response_text is missingapp.invoke({"user_input": "some question", "route": ""})
The fix is to include every declared field with a sensible initial value:
app.invoke({"user_input": "some question", "route": "", "response_text": ""})
Returning the full state from a node instead of a partial update. Nodes should only return the fields they are changing. Returning the full state object creates problems when multiple nodes update the same field: the last return wins and silently overwrites earlier updates.
# Don't do thisdef classify_request(state: SupportState) -> dict:return {"user_input": state["user_input"], # unchanged, no need to return this"route": "answer","response_text": state["response_text"], # unchanged, no need to return this}
The correct version returns only the field this node actually owns:
def classify_request(state: SupportState) -> dict:return {"route": "answer"}
Complete executable code
Here is the complete workflow in one file. We can run the code and check that each input produces the correct route and response type.
Lines 3–6: Imports.
Literalconstrains the routing function’s return type.TypedDictdefines the state schema.Groqis the LLM client.END,START, andStateGraphare LangGraph’s core graph primitives.Lines 9–12: State schema. Three fields: the user’s message, the routing decision, and the final response.
Lines 15–21: Classifier node. Keyword matching sets the
routefield. Returns a partial update containing onlyroute.Lines 24–32: Routing function. LangGraph calls this to resolve conditional edges. Returns the exact name of the next node.
Lines 35–41: Clarification node. Returns a fixed prompt without calling a model.
Lines 44–56: Answer node. The only LLM call in the workflow: builds the prompt, then calls Groq
chat.completions.createwithllama-3.1-8b-instantand maps the reply intoresponse_text.Lines 59–65: Approval gate node. Returns a safe holding message.
Lines 68–79: Graph assembly. Registers nodes, wires edges, adds conditional routing, and compiles.
Lines 82–95:
mainruns three examples that cover each route.Lines 98–99: Standard Python entry point that executes
main()when the script is run directly.
The table below shows what to expect for each of the three input types included in the script.
Input | Expected Route | What You Should See |
General question | answer | An LLM-generated 3–5 line response |
Vague or uncertain message | clarify | The clarification prompt |
Legal or refund topic | approval | The review holding message |
If the general-question route returns an empty response, check that the API key has been set correctly.
Exercise
The current workflow has three routes. We want to add a fourth: a billing path for customers asking about invoices or charges.
Add a billing_helper node that returns a message asking the user to share their invoice ID. Wire it into the classifier and the routing function so that inputs containing invoice or charge are handled by the new node instead of the general answer path.
Solution
First, extend the classifier to detect billing keywords:
def classify_request(state: SupportState) -> dict:text = state["user_input"].lower()if "invoice" in text or "charge" in text:return {"route": "billing"}if "refund" in text or "legal" in text:return {"route": "approval"}if "unclear" in text or "maybe" in text:return {"route": "clarify"}return {"route": "answer"}
Line 3–4: Checks for billing keywords before any other condition so they take priority.
Next, add the billing node and extend the routing function:
def billing_helper(state: SupportState) -> dict:return {"response_text": ("To look into this, please share your invoice ID ""and the approximate date of the charge.")}def choose_next_step(state: SupportState,) -> Literal["ask_followup", "draft_answer", "approval_gate", "billing_helper"]:route_map = {"clarify": "ask_followup","approval": "approval_gate","billing": "billing_helper","answer": "draft_answer",}return route_map.get(state["route"], "draft_answer")
Line 1–7: The billing node returns a specific data-gathering prompt.
Line 9–18:
billing_helperis added to the route map and theLiteralreturn type.
Finally, register the new node in build_graph():
builder.add_node("billing_helper", billing_helper)builder.add_edge("billing_helper", END)
Line 1: Registers the new node by name.
Line 2: Connects it to
ENDso execution terminates after it runs.
Core LangGraph terms
These are the key concepts from this lesson that we will keep using throughout the course.
Term | Meaning |
StateGraph | The main LangGraph class for building a workflow; takes a state schema as its argument |
Node | A plain Python function that receives state and returns a partial state update |
Edge | A connection between two nodes that defines what runs next |
Conditional edge | An edge resolved by a routing function at runtime, based on current state |
Compile | The step that validates and finalises the graph into a runnable object |
Invoke | The method that runs the compiled graph with an initial state and returns the final state |
We now have a working branching workflow and a clear mental model for why the graph structure is worth the extra setup compared to a flat LangChain pipeline. LangChain gave us the individual pieces, model calls, prompts, and retrievers. LangGraph gives us the structure to connect them with explicit control flow. The next lesson goes deeper into how nodes and edges actually work together, and we will look at what state is doing underneath every node call.