Search⌘ K
AI Features

Partial Failures and Recovery

Explore methods to distinguish four types of partial failures in Claude AI pipelines and develop coordinator-safe result objects. Learn to implement routing logic that ensures appropriate recovery actions, avoiding common retry anti-patterns and improving system reliability in production environments.

A total failure is easy to handle: every field is missing, the tool call returned an error, and the document cannot be processed. A partial failure is harder. Some fields succeeded; others did not. The tool returned a result, but it was empty, outdated, or covered only part of the document. In a pipeline, partial failures can propagate silently if the result object does not carry enough information for the coordinator to distinguish them from success. A coordinator that receives {"status": "ok", "fields": {}} cannot tell whether the extraction found nothing because the document had no extractable content, because a tool call failed halfway through, or because the document is from last quarter and should have been refreshed before processing.

This lesson covers how to classify partial failures precisely and how to build result objects that give the coordinator the information it needs to act. By the end of this lesson, we will be able to:

  • Distinguish the four partial failure types and explain why each requires a different response.

  • Write a coordinator-safe result object that carries a failure type, available data, and a recovery hint.

  • Implement coordinator logic that routes each failure type to the correct recovery action.

  • Explain why treating all partial failures as retryable is a design antipattern.

The four partial failure types

Partial failures look similar on the surface. The pipeline did not produce a complete result, but partial failures arise from different causes and require different responses. ...