Search⌘ K
AI Features

Monitoring and Logging

Explore how to build an observability layer for Amazon Bedrock applications by leveraging CloudWatch metrics, invocation logging, CloudTrail auditing, AWS X-Ray tracing, and AI quality monitoring. Understand key monitoring metrics, security considerations, and cost attribution techniques to operate scalable and secure generative AI systems in production environments.

The previous lesson covered orchestration patterns, including Amazon API Gateway routing, AWS Step Functions workflows, Amazon EventBridge event buses, Amazon SQS buffering, and quota monitoring, that provide the operational foundation for Amazon Bedrock applications. Running these workloads in production requires more than reliable request routing and workflow coordination. You need runtime visibility into the system, audit trails for user and service actions, request tracing across service boundaries, output-quality regression detection, and cost attribution for the teams, projects, or environments that generated the inference spend. This lesson builds the observability layer for monitoring, tracing, auditing, evaluation, and cost attribution.

Six AWS capabilities form the Bedrock observability stack. Amazon CloudWatch provides runtime metrics, dashboards, and alarms. Bedrock invocation logging captures full request and response payloads to S3 or CloudWatch Logs. AWS CloudTrail records every API call for security auditing. AWS X-Ray traces requests end-to-end across multi-service architectures. Application-level AI quality monitoring uses LLM-as-a-judge pipelines with custom CloudWatch metrics. Cost attribution leverages resource tagging and AWS Cost Explorer. These layers are complementary. Operational metrics tell you something is wrong; logs tell you what happened; traces tell you where it happened; quality monitoring tells you if the AI output degraded; and cost attribution tells you who is responsible.

A useful analogy is an aircraft cockpit. No single instrument gives the pilot enough information. The altimeter, engine gauges, navigation display, and fuel indicator each provide a different signal, and together they help the pilot understand what is happening. The same principle applies to observability for generative AI applications.

CloudWatch metrics and alarms

Bedrock automatically emits CloudWatch metrics under the AWS/Bedrock namespace for every model invocation, broken down by the ModelId dimension. These metrics require no configuration. They begin flowing the moment you call the Converse, InvokeModel, or their streaming counterparts.

The critical metrics include:

  • Invocations measures the total number of successful model inference requests processed by Amazon Bedrock.

  • InvocationLatency measures the total end-to-end latency from the initial request until the final response token is generated.

  • TimeToFirstToken measures streaming responsiveness by tracking how long it takes for the first response token to be returned after a request is submitted.

  • InvocationClientErrors tracks 4xx client-side errors caused by invalid requests, authentication failures, quota issues, or incorrect parameters.

  • InvocationServerErrors tracks 5xx server-side failures caused by issues within the Bedrock service infrastructure.

  • InputTokenCount measures the total number of input tokens sent to the foundation model across requests.

  • OutputTokenCount measures the total number of tokens generated by the foundation model in responses.

  • InvocationThrottles tracks requests that were ...