Search⌘ K
AI Features

Final Testing and Extension Paths

Explore comprehensive final testing strategies and practical extension techniques for your LangGraph research assistant. Understand how to validate the system using scenario tests, add new knowledge domains without disrupting existing workflows, and implement memory features to improve conversational continuity. This lesson equips you to confidently verify and expand your AI agent's capabilities for reliable real-world performance.

The previous lesson replaced the final stubs and added the quality review gate. Every node now has a real implementation. Every routing function handles its two possible paths. Every state field has one owning node.

The goal for this lesson is not to add more features. It is to verify what we have, understand it completely, and practice extending it. A system you can test confidently and extend cleanly is worth shipping.

Full scenario test suite

The table below presents the full scenario test suite for the research assistant, mapping each input to its expected execution path and the key field or condition used for validation.

Scenario

Input

Expected path

Expected field to check

Multi-domain clear question

"What is the pricing for the Pro plan and what are the API rate limits?"

Clarity → plan → search → synthesise → review → format

confidence_level = high or medium

Single-domain clear question

"How long do I have to request a refund for a digital product?"

Clarity → plan → search → synthesise → review → format

sources_used contains product_knowledge

SDK and support question

"Which SDKs are available and what level of support do Pro users get?"

Clarity → plan → search → synthesise → review → format

Two entries in sources_used

Topic not in knowledge base

"What are the terms for early contract termination?"

Clarity → plan → search → synthesise → review → fallback

quality_passed = False

Max steps cap

Any multi-part question with max_steps=1

Clarity → plan → search (one only) → synthesise → review → format or fallback

skipped_count > 0

Clarification: too short

"api"

Clarity → return clarification

needs_clarification = True

Clarification: vague opener

"Tell me everything"

Clarity → return clarification

clarification_question non-empty

Edge case: single-word pronoun

"How does it work?"

Clarity → return clarification

Heuristic 3 fires

Running this suite in a single main function and printing the diagnostic fields after each invocation is enough to verify the system end to end. We do not need a test framework; the returned state is the test oracle.

def run_test_suite(app) -> None:
print("=== Full scenario test suite ===\n")
scenarios = [
{
"label": "Multi-domain: pricing + API",
"question": "What is the pricing for the Pro plan and what are the API rate limits?",
"max_steps": 3,
"expect_clarification": False,
"expect_quality_pass": True,
},
{
"label": "Single domain: refund policy",
"question": "How long do I have to request a refund for a digital product?",
"max_steps": 3,
"expect_clarification": False,
"expect_quality_pass": True,
},
{
"label": "Multi-domain: SDK + support tier",
"question": "Which SDKs are available and what level of support do Pro users get?",
"max_steps": 3,
"expect_clarification": False,
"expect_quality_pass": True,
},
{
"label": "Topic not in knowledge base",
"question": "What are the terms for early contract termination?",
"max_steps": 3,
"expect_clarification": False,
"expect_quality_pass": False,
},
{
"label": "Max steps cap (max_steps=1)",
"question": "What is the pricing, API rate limit, and support SLA for Enterprise?",
"max_steps": 1,
"expect_clarification": False,
"expect_quality_pass": None, # depends on what one search returns
},
{
"label": "Clarification: too short",
"question": "api",
"max_steps": 3,
"expect_clarification": True,
"expect_quality_pass": None,
},
{
"label": "Clarification: vague opener",
"question": "Tell me everything",
"max_steps": 3,
"expect_clarification": True,
"expect_quality_pass": None,
},
{
"label": "Clarification: pronoun without topic",
"question": "How does it work?",
"max_steps": 3,
"expect_clarification": True,
"expect_quality_pass": None,
},
]
passed = 0
failed = 0
for s in scenarios:
result = app.invoke(make_state(s["question"], s["max_steps"]))
nc = result["needs_clarification"]
qp = result.get("quality_passed", False)
ok_nc = (nc == s["expect_clarification"])
ok_qp = (s["expect_quality_pass"] is None) or (qp == s["expect_quality_pass"])
status = "PASS" if (ok_nc and ok_qp) else "FAIL"
if status == "PASS":
passed += 1
else:
failed += 1
print(f" [{status}] {s['label']}")
print(f" needs_clarification : {nc} (expected {s['expect_clarification']})")
if not nc:
print(f" sources_used : {result['sources_used']}")
print(f" skipped_count : {result['skipped_count']}")
print(f" confidence_level : {result['confidence_level']}")
print(f" quality_passed : {qp} (expected {s['expect_quality_pass']})")
print(f" response preview : {result['formatted_response'][:120]}")
else:
print(f" clarification asked : {result['clarification_question'][:100]}")
print()
print(f"Results: {passed} passed, {failed} failed out of {len(scenarios)} scenarios.\n")
Full scenario test suite function
  • Lines 3–52: Eight scenario dicts, each carrying an expected value for needs_clarification and expect_quality_pass. None means the expected outcome depends on live model output and is not asserted.

  • Lines 54–74: For each scenario, the result is checked against both expected values and printed with a PASS/FAIL label. The diagnostic fields are printed whether the scenario passes or fails, so the output is useful for debugging as well as verification.

Node and state responsibility recap

The table below is the complete responsibility inventory for the finished system: every node, the single job it does, and every state field it writes.

Node

Type

Single responsibility

State fields written

check_clarity

Rule-based + model

Decide whether the question is clear enough to search

needs_clarification, clarification_question

return_clarification

Rule-based

Package the clarification question as the formatted response

formatted_response

plan_searches

Model (LLM)

Break the question into domain-tagged sub-queries

search_plan, step_count

execute_searches

Tool node

Route each sub-query to the right domain tool and collect results

search_results, sources_used, skipped_count, source_count

synthesise_findings

Model (LLM)

Combine all results into a direct answer with confidence assessment

synthesis, confidence_level

review_synthesis

Rule-based

Check whether the synthesis meets minimum quality criteria

quality_passed, quality_note

format_response

Rule-based

Assemble source header, confidence badge, and synthesis body

formatted_response

format_fallback

Rule-based

Produce a safe recovery message when quality review fails

formatted_response

The two routing functions (route_clarity, route_review) ...