Taming AWS bills with a 3-layer observability blueprint

Taming AWS bills with a 3-layer observability blueprint

Learn how to layer CUR 2.0, QuickSight dashboards, Cost Explorer alerts, and generative AI to shift from reactive cost firefighting to proactive cloud-spend control. This step-by-step guide shows you how each layer surfaces anomalies quickly, keeps budgets predictable, and lets teams focus on shipping features instead of chasing spikes.
8 mins read
Aug 01, 2025
Share

AWS provides the speed and scale to build resilient architectures. But that power comes with a notoriously complex (and often alarmingly high) monthly bill.

When costs spike, the default response is often a frantic, manual deep dive into the AWS Billing console, trying to find the culprit — resource or code change. The result is a reactive fire drill that wastes time and leaves us with more questions than answers.

But what if we could move from reactive reporting to a proactive, intelligent observability stack?

This is precisely the type of problem the Modern AWS Cost Observability Stack is designed to solve. It’s a layered approach that combines foundational data sources with automated BI and cutting-edge AI for deep, actionable insights.

In this guide, we’ll this solution step-by-step, covering:

  • The data foundation: Leveraging the new and improved Cost and Usage Report (CUR) 2.0.

  • Level 1 analysis: Deploying automated BI dashboards with QuickSight.

  • Level 2 analysis: Using Cost Explorer and Anomaly Detection for proactive alerts.

  • Level 3 analysis: Performing advanced investigations with Amazon Q and Bedrock.

Happy learning!

Cost and Usage Report (CUR) 2.0#

The AWS Cost and Usage Report, or CUR, is the foundational data source for any serious cloud cost analysis. It is the most detailed and comprehensive billing file AWS provides, automatically delivered to an S3 bucket owned by the user. This report contains a granular, line-by-line breakdown of every single charge across the accounts, with data available down to the specific resource ID and the exact hour a cost was incurred, making it the single source of truth for deep financial investigation.

Sample Cost and Usage report (CUR)
Sample Cost and Usage report (CUR)

CUR 2.0 is the modern evolution of this report, specifically engineered to be more stable and developer-friendly. Its key improvement is a consistent schema, which means the report’s columns no longer change unexpectedly, which helps prevent data pipelines and queries from breaking. Furthermore, complex data like cost allocation tags are now nested into single, structured columns, which simplifies SQL queries significantly. By providing a more reliable and easier-to-query structure, CUR 2.0 is the new standard for building robust, automated cost observability systems.

Level 1 analysis: Deploying automated BI dashboards with QuickSight#

For teams that need comprehensive visibility without the overhead of building a data pipeline from scratch, AWS now offers a powerful, pre-built solution that can be deployed in minutes. This is the fastest path to transforming raw billing data into an interactive Business Intelligence (BI) dashboard with zero development effort.

The process starts with Data Exports, which are the modern, centralized way to manage the Cost and Usage Report (CUR) delivery. When we enable the dashboard, we create a data export job that will continuously deliver the CUR 2.0 files to a designated S3 bucket.

Visualizing CUR using Amazon QuickSight
Visualizing CUR using Amazon QuickSight

When we enable this feature, AWS runs a CloudFormation template in the background that automatically provisions all the necessary resources to create a data pipeline and dashboard. This includes:

  • An AWS Glue Crawler to automatically scan the CUR data in S3, infer the data schema (the columns and data types), and create a metadata table in the AWS Glue Data Catalog.

  • Amazon Athena Views that create preconfigured SQL views in Athena, simplifying the raw CUR data and making it easier to query for common use cases.

  • Amazon QuickSight Assets to provision a complete set of QuickSight resources, including a Dataset connected to the Athena views, an Analysis file containing all the pre-built charts, and the final, shareable Dashboard.

This approach provides immediate, rich BI visualizations for the most critical cost metrics, including cost trends, service breakdowns, and usage patterns, without requiring us to write a single line of SQL or manually design a dashboard. It’s an out-of-the-box solution that immediately delivers about 80% of the insights most organizations need.

Level 2 analysis: Interactive exploration and proactive alerting#

While a full-scale BI dashboard provides the ultimate flexibility, the day-to-day cost management often starts with AWS’s built-in, managed tools. This layer is designed for two key purposes:

  • Rapid ad hoc analysis to answer immediate questions.

  • Proactively automate alerting to catch problems before they escalate.

AWS Cost Explorer for ad hoc analysis#

AWS Cost Explorer is a visual, interactive tool for analyzing, understanding, and managing AWS cloud spending. It provides detailed insights into cost and usage patterns across services, accounts, tags, and time ranges.

While it has many features, its most powerful and recent capability for rapid analysis is the ML-powered cost comparison view.

Cost Comparison in AWS Cost Explorer#

The Cost Comparison feature, announced in May 2025, is an automated analysis engine built directly into Cost Explorer. It intelligently compares two different time periods, for example, this month vs. last month, and automatically identifies and quantifies the top drivers of any cost changes.

This feature eliminates the incredibly time-consuming and error-prone process of manual cost-delta analysis.

Cost comparison in AWS Cost Explorer
Cost comparison in AWS Cost Explorer

Previously, to understand a cost spike, an engineer would have to:

  • Export cost data for two separate time periods to CSV.

  • Load these large files into a spreadsheet or analysis tool.

  • Manually create pivot tables to compare spending by service, usage type, and account.

  • Drill down repeatedly to find the source of the variance.

The Cost Comparison feature automates this entire workflow. It uses machine learning to surface the most significant cost deltas and provides a detailed breakdown of the root causes, such as a change in usage volume (e.g., more data transfer), a change in rates, or the expiration of discounts like Reserved Instances or Savings Plans. This reduces an investigation that could take hours down to just a few clicks.

AWS Cost Anomaly Detection#

The next layer in our observability stack is to build a proactive, automated sentry that watches the costs for us. AWS Cost Anomaly Detection is a managed service designed for exactly this purpose.

Cost Anomaly Detection is more sophisticated than simple budget thresholds. It’s designed to understand the context of our spending and reduce alert fatigue.

This service uses machine learning to analyze our historical cost and usage data, establishing a model of what constitutes “normal” for our account. It automatically accounts for natural growth and seasonal patterns. It then continuously monitors our spending, and when it detects a statistically significant deviation from this baseline, it flags it as an anomaly. This is far more effective than a static budget alert, which might fire unnecessarily during a predictable monthly spike.

We have granular control over what the AWS Cost Anomaly Detection watches. Instead of just monitoring our entire account, we can create multiple Cost Monitors to scope detection to specific dimensions. This is critical for assigning ownership and routing alerts to the correct teams. We can create monitors based on:

  • AWS services: Create a monitor for Amazon EC2 to watch for unexpected compute costs.

  • Linked accounts: Isolate the spending of a specific development or production account.

  • Cost allocation tags or cost categories: This is the most powerful option for FinOps. We can create a monitor for a specific tag like project:educative or a cost category like Team:Platform-Engineering, ensuring that alerts are highly relevant and directed to the team responsible for that spend.

Pro tip: We can improve our automated governance strategy by integrating Cost Anomaly Detection with an Amazon SNS topic to trigger automated remediation systems or send notifications.

Level 3 analysis: AI-driven insights#

While the first two layers provide excellent visibility and alerting, this final layer represents the cutting edge of cost observability. Here, we move beyond manual investigation into a world of conversational and programmatic analysis using generative AI.

Amazon Q for conversational cost analysis#

The fastest way to start with AI-driven cost analysis is to use Amazon Q, which is now integrated directly into the AWS Billing and Cost Management console.

Amazon Q acts as an AI-powered assistant that provides a natural language interface to our cost and usage data. It essentially serves as a conversational layer on top of the data that powers Cost Explorer, allowing us to bypass the manual process of filtering and grouping data in the UI.

This democratizes deep cost analysis. Any team member, regardless of their familiarity with the Cost Explorer interface or FinOps principles, can ask complex questions and get immediate insights. It dramatically reduces the time to answer critical questions during a cost investigation.

For example, instead of navigating through multiple filters to find the source of an S3 cost spike, an engineer can simply ask Amazon Q.

Custom AI solutions with Amazon Bedrock#

For ultimate power and integration, we can build our own custom, AI-powered FinOps tools using Amazon Bedrock. This approach allows us to create programmatic solutions, like Slack bots or automated investigators, that are tailored to our organization’s specific needs.

Building a custom AI solution for processing natural language queries for Cost and Usage reports using Bedrock
Building a custom AI solution for processing natural language queries for Cost and Usage reports using Bedrock

This solution involves creating a serverless application that acts as a “reasoning engine” for our cost data. The workflow is as follows:

  1. A user poses a question in natural language, for example, via a custom web app or a Slack command. The question is sent to an AWS Lambda function, which acts as the interface between the frontend and backend of our application and handles the business logic.

  2. The Lambda function invokes an Amazon Bedrock model, such as Anthropic's Claude 3 Sonnet. It passes the user’s question along with a carefully crafted prompt that instructs the model to act as a FinOps expert and translate the question into a precise Amazon Athena SQL query.

  3. The Lambda function executes the AI-generated SQL query in Athena.

  4. Athena executes the query against our CUR data in the S3 bucket.

  5. The raw SQL response is returned to the Lambda function.

  6. The raw, tabular results from the Athena query are then passed back to the Bedrock model in a second API call. This time, the prompt asks the model to summarize the raw data into a clear, insightful, human-readable answer.

  7. The final, summarized answer is returned to the user.

Wrapping up#

This newsletter provides a clear, layered strategy to transform the visibility of cloud spending in a modern AWS cost observability stack. By building on a solid foundation of CUR 2.0, we can move from simply reporting on costs to actively controlling them.

The journey from reactive to proactive is incremental yet powerful. It frees developers from manual investigations, provides finance teams with predictable budgets, and gives leadership the confidence to innovate at scale. By adopting this modern, intelligent approach, we are not just managing our AWS bill; we are building a more resilient, efficient, and financially sound cloud operation.

For more on this topic, don't miss the following courses:


Written By:
Fahim ul Haq
Free Edition
Protect your applications using AWS WAF
Learn how AWS WAF moves application security from a performance trade-off to an enabler of business resilience.
11 mins read
May 1, 2026