Search⌘ K
AI Features

Final Testing and Extension Paths

Explore final testing methods and extension strategies for your AI research assistant built with LangGraph. Learn how to verify the system through comprehensive scenario tests, add new knowledge domains like compliance, and implement memory to improve conversation continuity. Understand how to maintain a clean, reliable architecture ready for production use.

The previous lesson replaced the final stubs and added the quality review gate. Every node now has a real implementation. Every routing function handles its two possible paths. Every state field has one owning node.

The goal for this lesson is not to add more features. It is to verify what we have, understand it completely, and practice extending it. A system we can test confidently and extend cleanly is a system worth shipping.

Full scenario test suite

The table below presents the full scenario test suite for the research assistant, mapping each input to its expected execution path and the key field or condition used for validation.

Scenario

Input

Expected path

Expected field to check

Multi-domain clear question

"What is the pricing for the Pro plan and what are the API rate limits?"

Clarity → plan → search → synthesise → review → format

confidence_level = high or medium

Single-domain clear question

"How long do I have to request a refund for a digital product?"

Clarity → plan → search → synthesise → review → format

sources_used contains product_knowledge

SDK and support question

"Which SDKs are available and what level of support do Pro users get?"

Clarity → plan → search → synthesise → review → format

Two entries in sources_used

Topic not in knowledge base

"What are the terms for early contract termination?"

Clarity → plan → search → synthesise → review → fallback

quality_passed = False

Max steps cap

Any multi-part question with max_steps=1

Clarity → plan → search (one only) → synthesise → review → format or fallback

skipped_count > 0

Clarification: too short

"api"

Clarity → return clarification

needs_clarification = True

Clarification: vague opener

"Tell me everything"

Clarity → return clarification

clarification_question non-empty

Edge case: single-word pronoun

"How does it work?"

Clarity → return clarification

Heuristic 3 fires

Running this suite in a single main function and printing the diagnostic fields after each invocation is enough to verify the system end-to-end. We do not need a test framework; the returned state is the test oracle.

def run_test_suite(app) -> None:
print("=== Full scenario test suite ===\n")
scenarios = [
{
"label": "Multi-domain: pricing + API",
"question": "What is the pricing for the Pro plan and what are the API rate limits?",
"max_steps": 3,
"expect_clarification": False,
"expect_quality_pass": True,
},
{
"label": "Single domain: refund policy",
"question": "How long do I have to request a refund for a digital product?",
"max_steps": 3,
"expect_clarification": False,
"expect_quality_pass": True,
},
{
"label": "Multi-domain: SDK + support tier",
"question": "Which SDKs are available and what level of support do Pro users get?",
"max_steps": 3,
"expect_clarification": False,
"expect_quality_pass": True,
},
{
"label": "Topic not in knowledge base",
"question": "What are the terms for early contract termination?",
"max_steps": 3,
"expect_clarification": False,
"expect_quality_pass": False,
},
{
"label": "Max steps cap (max_steps=1)",
"question": "What is the pricing, API rate limit, and support SLA for Enterprise?",
"max_steps": 1,
"expect_clarification": False,
"expect_quality_pass": None, # depends on what one search returns
},
{
"label": "Clarification: too short",
"question": "api",
"max_steps": 3,
"expect_clarification": True,
"expect_quality_pass": None,
},
{
"label": "Clarification: vague opener",
"question": "Tell me everything",
"max_steps": 3,
"expect_clarification": True,
"expect_quality_pass": None,
},
{
"label": "Clarification: pronoun without topic",
"question": "How does it work?",
"max_steps": 3,
"expect_clarification": True,
"expect_quality_pass": None,
},
]
passed = 0
failed = 0
for s in scenarios:
result = app.invoke(make_state(s["question"], s["max_steps"]))
nc = result["needs_clarification"]
qp = result.get("quality_passed", False)
ok_nc = (nc == s["expect_clarification"])
ok_qp = (s["expect_quality_pass"] is None) or (qp == s["expect_quality_pass"])
status = "PASS" if (ok_nc and ok_qp) else "FAIL"
if status == "PASS":
passed += 1
else:
failed += 1
print(f" [{status}] {s['label']}")
print(f" needs_clarification : {nc} (expected {s['expect_clarification']})")
if not nc:
print(f" sources_used : {result['sources_used']}")
print(f" skipped_count : {result['skipped_count']}")
print(f" confidence_level : {result['confidence_level']}")
print(f" quality_passed : {qp} (expected {s['expect_quality_pass']})")
print(f" response preview : {result['formatted_response'][:120]}")
else:
print(f" clarification asked : {result['clarification_question'][:100]}")
print()
print(f"Results: {passed} passed, {failed} failed out of {len(scenarios)} scenarios.\n")
Full scenario test suite function
  • Lines 3–52: Eight scenario dicts, each carrying an expected value for needs_clarification and expect_quality_pass. None means the expected outcome depends on live model output and is not asserted.

  • Lines 54–74: For each scenario, the result is checked against both expected values and printed with a PASS / FAIL label. The diagnostic fields are printed whether the scenario passes or fails, so the output is useful for debugging as well as verification. ... ...