Persistence and Checkpoints

Explore how to implement persistence and checkpointing in LangGraph workflows to maintain conversation history and recover from interruptions. Learn to use MemorySaver and SqliteSaver to store state snapshots, manage multi-user sessions with thread IDs, and build robust AI agents capable of resuming tasks without losing context.

We'll cover the following...

What checkpointing does
Thread IDs and isolated execution contexts
The two checkpointer backends
What we are building
State design
Build it step by step
Complete executable code
Exercise
Solution
Terms introduced in this lesson

Every demo workflow we have built so far shares one characteristic: if the Python process stops, everything is gone. State lives in memory for the duration of one script run. When the run ends, whether it finishes cleanly, crashes, or is interrupted, the state disappears. For a demo, that is fine. For a production system, it is a serious problem.

Consider a workflow that takes four minutes to run: it fetches data from a slow API, calls a language model twice, performs quality checks, and formats a final report. If the process crashes after two minutes, we want to resume from the last completed node, not restart from scratch. Consider a multi-turn assistant that has built up twenty exchanges of conversation context. If the server restarts overnight, we want the next morning’s messages to have access to that history, not start cold.

LangGraph addresses both problems with checkpointing. A checkpointer saves a snapshot of the graph’s state after every node execution. Those snapshots persist in a storage backend, in memory during development, in a database for production. When execution resumes, LangGraph loads the latest snapshot and continues from there.

What checkpointing does

Checkpointing operates at the node boundary. After each node returns its partial state update and LangGraph merges it into the running state, the checkpointer writes the updated state to its backend. The diagram below shows where this happens in the execution cycle.

This means that even in a three-node workflow, three independent snapshots exist by the time execution finishes. A crash between nodes B and C would result in a resume that starts at node C with the state exactly as node B left it.

Thread IDs and isolated execution contexts

Every invocation of a checkpointed graph runs inside a named context called a thread. The thread ID is a string you provide at invocation time through a config dict. All snapshots for a given thread are stored and retrieved together.

Two threads with different IDs are completely isolated. They can use the same compiled graph, but their state snapshots never mix. This is how multi-user systems work: each user session gets its own thread ID, and conversations remain separate even though they all run through the same graph. The table below shows how thread behaviour differs from a plain invocation.

Aspect	Without `thread_id`	With `thread_id`
State persistence	Discarded when invocation ends	Saved to the checkpointer after every node
Second invocation	Starts from scratch every time	Loads the latest checkpoint and merges new input
Multiple users	Not applicable	Each user gets an isolated thread by using a unique ID
Inspection	Read the return value only	Use `app.get_state(config)` at any time

1.Before Getting Started

2.From Chains to Graphs

3.Control Flow and Agent Patterns

4.Reliable Real-World Systems

5.Capstone Build, Guided Walkthrough

6.Course Wrap Up

Persistence and Checkpoints

What checkpointing does

Thread IDs and isolated execution contexts

The two checkpointer backends