Final Testing and Extension Paths
Explore comprehensive final testing strategies and practical extension techniques for your LangGraph research assistant. Understand how to validate the system using scenario tests, add new knowledge domains without disrupting existing workflows, and implement memory features to improve conversational continuity. This lesson equips you to confidently verify and expand your AI agent's capabilities for reliable real-world performance.
The previous lesson replaced the final stubs and added the quality review gate. Every node now has a real implementation. Every routing function handles its two possible paths. Every state field has one owning node.
The goal for this lesson is not to add more features. It is to verify what we have, understand it completely, and practice extending it. A system you can test confidently and extend cleanly is worth shipping.
Full scenario test suite
The table below presents the full scenario test suite for the research assistant, mapping each input to its expected execution path and the key field or condition used for validation.
Scenario | Input | Expected path | Expected field to check |
Multi-domain clear question | "What is the pricing for the Pro plan and what are the API rate limits?" | Clarity → plan → search → synthesise → review → format |
|
Single-domain clear question | "How long do I have to request a refund for a digital product?" | Clarity → plan → search → synthesise → review → format |
|
SDK and support question | "Which SDKs are available and what level of support do Pro users get?" | Clarity → plan → search → synthesise → review → format | Two entries in |
Topic not in knowledge base | "What are the terms for early contract termination?" | Clarity → plan → search → synthesise → review → fallback |
|
Max steps cap | Any multi-part question with | Clarity → plan → search (one only) → synthesise → review → format or fallback |
|
Clarification: too short |
| Clarity → return clarification |
|
Clarification: vague opener |
| Clarity → return clarification |
|
Edge case: single-word pronoun |
| Clarity → return clarification | Heuristic 3 fires |
Running this suite in a single main function and printing the diagnostic fields after each invocation is enough to verify the system end to end. We do not need a test framework; the returned state is the test oracle.
def run_test_suite(app) -> None:print("=== Full scenario test suite ===\n")scenarios = [{"label": "Multi-domain: pricing + API","question": "What is the pricing for the Pro plan and what are the API rate limits?","max_steps": 3,"expect_clarification": False,"expect_quality_pass": True,},{"label": "Single domain: refund policy","question": "How long do I have to request a refund for a digital product?","max_steps": 3,"expect_clarification": False,"expect_quality_pass": True,},{"label": "Multi-domain: SDK + support tier","question": "Which SDKs are available and what level of support do Pro users get?","max_steps": 3,"expect_clarification": False,"expect_quality_pass": True,},{"label": "Topic not in knowledge base","question": "What are the terms for early contract termination?","max_steps": 3,"expect_clarification": False,"expect_quality_pass": False,},{"label": "Max steps cap (max_steps=1)","question": "What is the pricing, API rate limit, and support SLA for Enterprise?","max_steps": 1,"expect_clarification": False,"expect_quality_pass": None, # depends on what one search returns},{"label": "Clarification: too short","question": "api","max_steps": 3,"expect_clarification": True,"expect_quality_pass": None,},{"label": "Clarification: vague opener","question": "Tell me everything","max_steps": 3,"expect_clarification": True,"expect_quality_pass": None,},{"label": "Clarification: pronoun without topic","question": "How does it work?","max_steps": 3,"expect_clarification": True,"expect_quality_pass": None,},]passed = 0failed = 0for s in scenarios:result = app.invoke(make_state(s["question"], s["max_steps"]))nc = result["needs_clarification"]qp = result.get("quality_passed", False)ok_nc = (nc == s["expect_clarification"])ok_qp = (s["expect_quality_pass"] is None) or (qp == s["expect_quality_pass"])status = "PASS" if (ok_nc and ok_qp) else "FAIL"if status == "PASS":passed += 1else:failed += 1print(f" [{status}] {s['label']}")print(f" needs_clarification : {nc} (expected {s['expect_clarification']})")if not nc:print(f" sources_used : {result['sources_used']}")print(f" skipped_count : {result['skipped_count']}")print(f" confidence_level : {result['confidence_level']}")print(f" quality_passed : {qp} (expected {s['expect_quality_pass']})")print(f" response preview : {result['formatted_response'][:120]}")else:print(f" clarification asked : {result['clarification_question'][:100]}")print()print(f"Results: {passed} passed, {failed} failed out of {len(scenarios)} scenarios.\n")
Lines 3–52: Eight scenario dicts, each carrying an expected value for
needs_clarificationandexpect_quality_pass.Nonemeans the expected outcome depends on live model output and is not asserted.Lines 54–74: For each scenario, the result is checked against both expected values and printed with a
PASS/FAILlabel. The diagnostic fields are printed whether the scenario passes or fails, so the output is useful for debugging as well as verification.
Node and state responsibility recap
The table below is the complete responsibility inventory for the finished system: every node, the single job it does, and every state field it writes.
Node | Type | Single responsibility | State fields written |
| Rule-based + model | Decide whether the question is clear enough to search |
|
| Rule-based | Package the clarification question as the formatted response |
|
| Model (LLM) | Break the question into domain-tagged sub-queries |
|
| Tool node | Route each sub-query to the right domain tool and collect results |
|
| Model (LLM) | Combine all results into a direct answer with confidence assessment |
|
| Rule-based | Check whether the synthesis meets minimum quality criteria |
|
| Rule-based | Assemble source header, confidence badge, and synthesis body |
|
| Rule-based | Produce a safe recovery message when quality review fails |
|
The two routing functions (route_clarity, route_review) ...