Persistence and Checkpoints
Explore how to implement persistence and checkpointing in LangGraph workflows to maintain conversation history and recover from interruptions. Learn to use MemorySaver and SqliteSaver to store state snapshots, manage multi-user sessions with thread IDs, and build robust AI agents capable of resuming tasks without losing context.
Every demo workflow we have built so far shares one characteristic: if the Python process stops, everything is gone. State lives in memory for the duration of one script run. When the run ends, whether it finishes cleanly, crashes, or is interrupted, the state disappears. For a demo, that is fine. For a production system, it is a serious problem.
Consider a workflow that takes four minutes to run: it fetches data from a slow API, calls a language model twice, performs quality checks, and formats a final report. If the process crashes after two minutes, we want to resume from the last completed node, not restart from scratch. Consider a multi-turn assistant that has built up twenty exchanges of conversation context. If the server restarts overnight, we want the next morning’s messages to have access to that history, not start cold.
LangGraph addresses both problems with checkpointing. A checkpointer saves a snapshot of the graph’s state after every node execution. Those snapshots persist in a storage backend, in memory during development, in a database for production. When execution resumes, LangGraph loads the latest snapshot and continues from there.
What checkpointing does
Checkpointing operates at the node boundary. After each node returns its partial state update and LangGraph merges it into the running state, the checkpointer writes the updated state to its backend. The diagram below shows where this happens in the execution cycle.
This means that even in a three-node workflow, three independent snapshots exist by the time execution finishes. A crash between nodes B and C would result in a resume that starts at node C with the state exactly as node B left it.
Thread IDs and isolated execution contexts
Every invocation of a checkpointed graph runs inside a named context called a thread. The thread ID is a string you provide at invocation time through a config dict. All snapshots for a given thread are stored and retrieved together.
Two threads with different IDs are completely isolated. They can use the same compiled graph, but their state snapshots never mix. This is how multi-user systems work: each user session gets its own thread ID, and conversations remain separate even though they all run through the same graph. The table below shows how thread behaviour differs from a plain invocation.
Aspect | Without | With |
State persistence | Discarded when invocation ends | Saved to the checkpointer after every node |
Second invocation | Starts from scratch every time | Loads the latest checkpoint and merges new input |
Multiple users | Not applicable | Each user gets an isolated thread by using a unique ID |
Inspection | Read the return value only | Use |
The two checkpointer backends
LangGraph ships with two checkpointers that cover the most common use ...