CLOUD LABS
Monitor GenAI Applications in Production Using CloudWatch
In this Cloud Lab, you will build a monitoring solution for a generative AI app using AWS Lambda, Bedrock, CloudWatch, and SNS, with metrics, latency, token tracking, and alerts for failures.
beginner
Certificate of Completion
Learning Objectives
Monitoring is critical for operating production-grade generative AI applications. Unlike traditional workloads, GenAI systems introduce new operational signals, such as token consumption, model latency, and inference errors, that directly affect cost, performance, and user experience. Without proper monitoring, it becomes difficult to detect failures, control token usage, or diagnose performance bottlenecks in real time.
In this Cloud Lab, you’ll learn how to implement production-ready monitoring for a generative AI application on AWS using Amazon CloudWatch. You’ll begin by creating an AWS Lambda function that invokes an Amazon Bedrock foundation model to generate responses. The function will publish custom CloudWatch metrics such as invocations, errors, latency in milliseconds, input tokens, output tokens, and total tokens using the PutMetricData API. These metrics provide deep operational visibility into both system health and model usage patterns. Next, you’ll generate structured logs and test both successful and failure scenarios to simulate real-world production behavior. You’ll then build a CloudWatch dashboard to visualize application health and model usage trends over time. The dashboard will include widgets for invocations, error rates, latency trends, and token consumption metrics, enabling you to monitor system performance and cost-related signals in a centralized view.
Finally, you’ll configure a CloudWatch alarm to detect inference failures and integrate it with an Amazon SNS topic to send email notifications when error thresholds are breached. By triggering controlled failures, you’ll observe how alerts are generated and how proactive monitoring supports reliable GenAI operations.
After completing this Cloud Lab, you’ll have a strong understanding of how to design and implement monitoring for production generative AI workloads on AWS. You’ll know how to publish and analyze custom metrics, build monitoring dashboards, configure alerts, and track token usage to maintain performance, reliability, and cost control in GenAI applications.
The following is the high-level architecture diagram of the infrastructure you’ll create in this Cloud Lab:
What is monitoring in generative AI?
Monitoring in generative AI refers to the practice of tracking and analyzing system performance and usage in production. Unlike traditional applications, GenAI systems generate additional operational signals, such as token consumption, inference latency, and model errors, which directly impact user experience, cost, and reliability.
Effective monitoring allows teams to detect failures, identify performance bottlenecks, and optimize resource usage, ensuring that the AI application behaves as expected in real time.
Core monitoring components
Monitoring GenAI applications relies on several key signals:
Metrics: Quantitative measurements like model latency, invocation counts, errors, and token usage. Metrics provide a high-level view of system performance and trends.
Logs: Detailed records of each request and response, including input/output tokens, errors, and reasoning traces. Logs help troubleshoot issues and understand system behavior.
Dashboards and visualizations: Centralized views of metrics and logs that make it easy to track performance, detect anomalies, and observe trends over time.
Why monitoring matters for GenAI
Monitoring is essential for production AI systems because it enables teams to:
Track performance: Metrics like latency and error rates reveal bottlenecks and allow proactive intervention before users are impacted.
Control costs: Token usage directly affects operational expenses. Monitoring input, output, and total tokens helps optimize consumption and reduce costs.
Detect failures: By tracking errors and abnormal patterns, monitoring alerts teams to issues such as failed invocations or slow responses.
Improve reliability: Structured logs and dashboards make it easier to diagnose problems and ensure the system behaves consistently, even under high load or unexpected conditions.
Before you start...
Try these optional labs before starting this lab.
Relevant Courses
Use the following content to review prerequisites or explore specific concepts in detail.
Felipe Matheus
Software Engineer
Adina Ong
Senior Engineering Manager
Clifford Fajardo
Senior Software Engineer
Thomas Chang
Software Engineer
Copyright ©2026 Educative, Inc. All rights reserved.