Orchestration and APIs
Explore how to connect and orchestrate AWS services such as API Gateway, Lambda, Step Functions, EventBridge, and SQS to build scalable and reliable generative AI applications using Amazon Bedrock. Understand API management, streaming options, event-driven automation, queuing for burst traffic, and quota monitoring essential for production AI systems.
Building generative AI applications for production requires more than selecting an appropriate foundation model and writing effective prompts. The previous lesson explored how Amazon SageMaker and Amazon Bedrock integration patterns support model access and custom ML workflows, but exposing those capabilities to users requires an orchestration layer. This lesson focuses on the services that connect client applications to Amazon Bedrock, routing requests, coordinating workflows, reacting to events, and helping prevent overload through throttling, queues, retries, and scaling controls. This layer works like traffic control for your AI application: without it, even a strong model can be difficult to use reliably in production.
Amazon API Gateway serves as the managed entry point that sits between client applications and Lambda functions that invoke Bedrock. Rather than exposing Lambda functions directly, API Gateway provides a managed API layer that handles authentication, validation, throttling, and caching before any request reaches your inference logic.
Several REST API design considerations matter when fronting Bedrock workloads. Request models and validators enforce JSON schema on incoming prompts, rejecting malformed payloads before they consume Lambda execution time.
Practical tip: Enable response caching for endpoints that serve predictable queries, such as product descriptions or policy summaries. A 5-minute TTL can dramatically reduce Bedrock invocation costs without noticeably degrading answer freshness.
Amazon Bedrock exposes multiple inference APIs, each of which maps naturally to different API Gateway integration patterns. The InvokeModel API provides direct synchronous model inference for standard request-response workloads and pairs well with REST proxy integrations. InvokeModelWithResponseStream enables token-by-token streaming for low-latency responses, making it suitable for real-time conversational interfaces. The Converse and ConverseStream APIs are designed for conversational interactions, with the client sending the conversation history with each request to maintain context across turns. ConverseStream extends this pattern by delivering tokens in a streaming fashion for chat-style user experiences.
For latency-sensitive workloads, the performanceConfig parameter allows applications to choose between standard and optimized inference modes at invocation time, providing a tuning lever for balancing responsiveness and cost without changing the underlying model selection.
The following diagram illustrates how REST and WebSocket patterns differ when used to front Bedrock:
WebSocket APIs for streaming
Standard REST request-response cycles force the client to wait until Bedrock generates the entire response before displaying anything. For conversational AI interfaces, this creates an unacceptable delay. Users expect to see tokens appear progressively, much like watching someone type a reply in a chat application.
WebSocket APIs in API Gateway maintain persistent bidirectional connections that enable token-by-token streaming. The connection life cycle follows a predictable sequence:
Connect route: Authenticates the user and registers the connectionId in a DynamoDB table, establishing the session record that subsequent messages reference.
Disconnect route: Cleans up the connection record from DynamoDB when the client disconnects, or the idle timeout expires.
Graceful disconnection handling matters because clients can drop unexpectedly. Configure the idle timeout to match your application’s expected interaction cadence, and ensure the $disconnect route reliably removes stale records.
Attention: WebSocket APIs do not support REST-style usage plans or API keys. Applications typically enforce tenant-level throttling through Lambda concurrency controls, custom authorization logic, or application-side rate limiting.
Streaming responses introduce a trade-off between user experience and architectural complexity. REST APIs are simpler to build, cache, and monitor, while WebSocket APIs require connection state management but deliver a significantly better conversational experience.
Step Functions for multi-service workflows
When a Bedrock-powered application involves more than a single inference call, you need an orchestration engine that coordinates multiple services reliably. AWS Step Functions provides exactly this capability through visual, serverless state machines.
Designing a document processing pipeline
Consider a concrete workflow triggered when a document arrives in S3. An AWS Step Functions state machine coordinates these stages: an AWS Lambda function extracts the document text and splits it into chunks, a Bedrock StartIngestionJob call initiates ingestion and indexing into the knowledge base, the workflow waits for ingestion completion, and a subsequent Bedrock InvokeAgent call analyzes the newly indexed content, a DynamoDB PutItem stores the analysis results, and an SNS Publish sends a notification to downstream consumers.
Choosing between “Standard” and “Express” workflows
Step Functions offers two workflow types, and the choice directly impacts your Bedrock architecture.
Standard workflows support long-running executions with exactly-once semantics and a full audit trail. They suit multi-step document processing pipelines where each execution may take minutes, and you need guaranteed delivery.
Express workflows handle high-volume, short-duration executions with at-most-once or at-least-once semantics. They are the right choice for high-throughput Bedrock invocation patterns where individual executions complete in seconds.
Error handling is built into the state machine definition itself. Retry blocks with exponential backoff handle transient ThrottlingException errors from Bedrock without any custom code. Catch blocks route persistent failures to cleanup or notification states. This declarative retry logic is a significant advantage over implementing backoff manually in Lambda.
The following table compares all the orchestration and messaging services covered in this lesson.