Final Testing and Extension Paths
Explore final testing methods and extension strategies for your AI research assistant built with LangGraph. Learn how to verify the system through comprehensive scenario tests, add new knowledge domains like compliance, and implement memory to improve conversation continuity. Understand how to maintain a clean, reliable architecture ready for production use.
The previous lesson replaced the final stubs and added the quality review gate. Every node now has a real implementation. Every routing function handles its two possible paths. Every state field has one owning node.
The goal for this lesson is not to add more features. It is to verify what we have, understand it completely, and practice extending it. A system we can test confidently and extend cleanly is a system worth shipping.
Full scenario test suite
The table below presents the full scenario test suite for the research assistant, mapping each input to its expected execution path and the key field or condition used for validation.
Scenario | Input | Expected path | Expected field to check |
Multi-domain clear question | "What is the pricing for the Pro plan and what are the API rate limits?" | Clarity → plan → search → synthesise → review → format |
|
Single-domain clear question | "How long do I have to request a refund for a digital product?" | Clarity → plan → search → synthesise → review → format |
|
SDK and support question | "Which SDKs are available and what level of support do Pro users get?" | Clarity → plan → search → synthesise → review → format | Two entries in |
Topic not in knowledge base | "What are the terms for early contract termination?" | Clarity → plan → search → synthesise → review → fallback |
|
Max steps cap | Any multi-part question with | Clarity → plan → search (one only) → synthesise → review → format or fallback |
|
Clarification: too short |
| Clarity → return clarification |
|
Clarification: vague opener |
| Clarity → return clarification |
|
Edge case: single-word pronoun |
| Clarity → return clarification | Heuristic 3 fires |
Running this suite in a single main function and printing the diagnostic fields after each invocation is enough to verify the system end-to-end. We do not need a test framework; the returned state is the test oracle.
def run_test_suite(app) -> None:print("=== Full scenario test suite ===\n")scenarios = [{"label": "Multi-domain: pricing + API","question": "What is the pricing for the Pro plan and what are the API rate limits?","max_steps": 3,"expect_clarification": False,"expect_quality_pass": True,},{"label": "Single domain: refund policy","question": "How long do I have to request a refund for a digital product?","max_steps": 3,"expect_clarification": False,"expect_quality_pass": True,},{"label": "Multi-domain: SDK + support tier","question": "Which SDKs are available and what level of support do Pro users get?","max_steps": 3,"expect_clarification": False,"expect_quality_pass": True,},{"label": "Topic not in knowledge base","question": "What are the terms for early contract termination?","max_steps": 3,"expect_clarification": False,"expect_quality_pass": False,},{"label": "Max steps cap (max_steps=1)","question": "What is the pricing, API rate limit, and support SLA for Enterprise?","max_steps": 1,"expect_clarification": False,"expect_quality_pass": None, # depends on what one search returns},{"label": "Clarification: too short","question": "api","max_steps": 3,"expect_clarification": True,"expect_quality_pass": None,},{"label": "Clarification: vague opener","question": "Tell me everything","max_steps": 3,"expect_clarification": True,"expect_quality_pass": None,},{"label": "Clarification: pronoun without topic","question": "How does it work?","max_steps": 3,"expect_clarification": True,"expect_quality_pass": None,},]passed = 0failed = 0for s in scenarios:result = app.invoke(make_state(s["question"], s["max_steps"]))nc = result["needs_clarification"]qp = result.get("quality_passed", False)ok_nc = (nc == s["expect_clarification"])ok_qp = (s["expect_quality_pass"] is None) or (qp == s["expect_quality_pass"])status = "PASS" if (ok_nc and ok_qp) else "FAIL"if status == "PASS":passed += 1else:failed += 1print(f" [{status}] {s['label']}")print(f" needs_clarification : {nc} (expected {s['expect_clarification']})")if not nc:print(f" sources_used : {result['sources_used']}")print(f" skipped_count : {result['skipped_count']}")print(f" confidence_level : {result['confidence_level']}")print(f" quality_passed : {qp} (expected {s['expect_quality_pass']})")print(f" response preview : {result['formatted_response'][:120]}")else:print(f" clarification asked : {result['clarification_question'][:100]}")print()print(f"Results: {passed} passed, {failed} failed out of {len(scenarios)} scenarios.\n")
Lines 3–52: Eight scenario dicts, each carrying an expected value for
needs_clarificationandexpect_quality_pass.Nonemeans the expected outcome depends on live model output and is not asserted.Lines 54–74: For each scenario, the result is checked against both expected values and printed with a
PASS/FAILlabel. The diagnostic fields are printed whether the scenario passes or fails, so the output is useful for debugging as well as verification. ... ...