AWS’s latest AI updates are a big deal for devs––here's why

AWS’s latest AI updates are a big deal for devs––here's why

From cutting-edge SageMaker upgrades that slash costs and automate training to Amazon Bedrock’s new AI models and optimizations, AWS re:Invent 2024 unveiled game-changing updates for developers.
13 mins read
Mar 07, 2025
Share

AWS re:Invent 2024 introduced groundbreaking innovations in AI and machine learning—bringing faster, smarter, and more cost-effective solutions to developers and businesses alike.

In today's newsletter, we're breaking down the keynote announcement from Dr. Swami Sivasubramanian, VP of Data and AI at AWS, including:

  • Amazon SageMaker now provides budget-friendly automated resource provisioning, up to 90% utilization of compute resource allocation, and the availability of third-party apps.

  • Amazon Bedrock offers leading AWS Partner AI models, automated cost, latency, and accuracy optimizations, easier and automated data processing features, and responsible AI for multimodal models.

  • Amazon Q Developer assists in Amazon SageMaker Canvas, ML workflows, and Amazon Q’s scenario analysis capability in QuickSight.

Let's dive into the most important AI updates from re:Invent 2024—and what they mean for you.

The next generation of Amazon SageMaker#

Amazon SageMaker remains at the forefront of machine learning model development, offering new capabilities that improve training efficiency, resource allocation, and governance. AWS has introduced multiple enhancements to optimize cost and performance, making ML training more scalable and accessible.

Let’s look at some of the key enhancements to Amazon Sagemaker:

Amazon SageMaker HyperPod flexible training plans#

If you're tired of guessing the best compute setup for model training while balancing cost and time, Amazon SageMaker HyperPod Flexible Training Plans take the guesswork out by automatically provisioning resources based on your budget and time constraints—ensuring you get the best performance without overspending.

Automated provisioning of Amazon SageMaker HyperPod cluster through flexible training plans
Automated provisioning of Amazon SageMaker HyperPod cluster through flexible training plans

Here's how:

  • It creates an optimal training plan and reserves a capacity based on our timeline and budget, sets up a cluster and training jobs, and saves weeks of manually training a model.

  • It does checkpointing and takes care of any instance failures without any manual intervention.

Amazon SageMaker HyperPod task governance#

Managing compute resources across multiple teams is a challenge—idle instances rack up costs, and interruptions risk delays. While small teams can get by with spreadsheets, large-scale AI workloads demand a smarter approach.

Amazon SageMaker HyperPod task governance resolves these issues by dynamically allocating compute resources, ensuring:

  • 90%+ efficiency of accelerated compute instance allocation.

  • Maximized efficiency for model training, fine-tuning, and inference by automating prioritization and management of GenAI tasks.

  • Automated task prioritization to keep high-priority jobs on schedule

  • Real-time monitoring with a dashboard for better visibility and faster decision-making

These innovations in Amazon SageMaker Hyperpod will allow users to jump directly to the development part and not worry about the architecture’s cost, configuration, and management.

Deployment of AWS Partner apps in Amazon SageMaker#

Integrating third-party AI tools like Comet, Fiddler, and Deepchecks into your MLOps workflow is valuable—but managing infrastructure, scaling, and security can be a headache.

AWS now makes it easier by offering fully managed deployment of AWS Partner AI apps directly within Amazon SageMaker, so you can:

  • Accelerate model development with popular partner Al applications.

  • Eliminate infrastructure headaches—no provisioning or scaling required

  • Ensure data security by keeping everything within your SageMaker environment

Amazon Bedrock: Building blocks for GenAI applications#

Generative AI continues to gain traction across industries. AWS is refining Amazon Bedrock to help organizations choose the best model, optimize for cost and latency, customize data, and ensure responsible AI implementation.

By enhancing model accessibility and introducing advanced performance optimizations, Bedrock is becoming a go-to solution for AI-powered applications.

New Features in Amazon Bedrock

New Models

Luma Ray 2

poolside

Stable Diffusion 3.5

Bedrock Marketplace

Cost, Latency, and Accuracy Optimizations

Prompt caching

Intelligent prompt routing

Data Processing Features

Support for GraphRAG

Data Automation

Structured data retrieval

Kendra GenAI Index

Responsible AI Features

Multimodal toxicity

The latest updates in Amazon Bedrock focus on streamlining model selection and data processing and enabling businesses to deploy AI solutions that align with their specific needs while minimizing expenses and response times.

Additionally, AWS has integrated new safety mechanisms to enhance trust and compliance, reinforcing Bedrock’s role as a secure and efficient AI development platform.

How Bedrock helps choose the best model for your use case#

AWS has expanded its AI capabilities by integrating cutting-edge models and services from leading AI companies like Luma AI, Stability AI, and poolside, making it easier to find the right model for your specific needs.

With the launch of Bedrock Marketplace, developers now have a centralized hub to discover and deploy specialized AI models—simplifying AI development on AWS.

Here's how Bedrock helps match the right model to your use case:

  • Streamline software development: The poolside models (malibupoolside's malibu model tackles complex software engineering challenges, such as code generation, test writing, refactoring, documentation, and more. and pointpoolside’s point model excels at low-latency code completion and uses advanced context awareness to accurately predict developers’ needs.) are designed to assist developers in code generation, testing, documentation, and real-time code completion. By automating repetitive tasks and providing intelligent suggestions, poolside reduces development time and improves software reliability. This is particularly valuable for large enterprises facing challenges in modern software engineering, such as debugging and refactoring complex codebases.

  • Enhance creative content generation: Stability AI’s Stable Diffusion 3.5 model is ideal for generating high-quality images for media, design, and marketing applications. Its enhanced fine-tuning capabilities using AWS Trainium and Inferentia chips deliver improved quality in AI-generated content.

  • Create realistic video content: Luma AI’s Luma Ray 2 model specializes in generating realistic visuals with natural, coherent motion. It can produce 5 to 9-second video clips at 540p and 720p resolutions, making it perfect for applications like short-form video marketing, product demos, or social media content. It empowers businesses to create engaging video content without expensive production setups.

By offering a broader selection of generative AI models, AWS empowers businesses and developers to pick the best tool for the job—without extra complexity.

Access to even more models (and benefits)#

While the models introduced above give top-notch performance for specific use cases, developers can also use Bedrock Marketplace explore hundreds of other models like DeepSeek.

Plus, Bedrock Marketplace can also:

  • Streamline development workflows with a unified console experience.

  • Deploy models on managed endpoints with custom scaling policies.

  • Leverage Amazon Bedrock APIs, tools, and security.

Optimizations for cost and latency#

While Bedrock provides access to cutting-edge foundation models, choosing between low latency and low cost can still be a challenge. AWS does already offer model distillation and latency-optimized inferenceLatency-optimized inference in Amazon Bedrock allows access to the latest AI hardware and software optimizations for various AI models., but there's always room for improvement.

Let’s look at the features that you can use to optimize the latency of repetitive prompts and routing of prompts to different models.

Prompt caching#

Let’s say that you reply to user queries sent to you by users, and you have this long list of instructions on how to reply to the user, and you have an LLM that helps you generate responses.

You will have to ask different questions from the model, but each time you will also have to pass the long list of instructions to the model. When repetitive prompts are given to a model with a similar long context, it becomes computationally expensive for the model to process them.

The longer the prompts are given to a model, the greater the number of tokens generated by the foundational model and the higher the processing time and token generation cost will be, especially when long prompts are repeated often.

Prompt caching
Prompt caching

AWS Bedrock now supports prompt caching to resolve this issue and caches repetitive context in prompts across multiple API calls, reducing latency by up to 85% and cost by up to 90%.

Intelligent prompt routing#

Deploying multiple models for varying prompt complexity introduces cost and latency optimization challenges. Smaller models offer a good balance of accuracy and latency for simple queries, and larger models are necessary for complex ones. However, routing requests to the appropriate model requires complex coding. This becomes even more challenging as request patterns change or new models become available, necessitating constant recording and maintenance.

Intelligent prompt routing
Intelligent prompt routing

Intelligent prompt routing solves the model selection problem. By defining your preferred models and cost/latency thresholds, this feature automatically routes prompts to the optimal foundation model within a model family via a single endpoint. Advanced prompt matching ensures quality while minimizing costs, reducing development expenses by up to 30% without sacrificing accuracy.

Preparing data for Generative AI#

AWS recognizes that effective generative AI applications hinge on high-quality, readily accessible data. They’re committed to simplifying and optimizing the often complex data preparation process. This commitment is evident in the suite of features and services designed to streamline data ingestion, transformation, and retrieval for generative AI workflows.

Kendra GenAI Index#

Amazon Kendra is a search service that uses machine learning to help users find information within an organization. The service delivers a unified search experience across structured and unstructured content, leveraging natural language processing for precise answers.

Kendra ranks search results based on content attributes, user behavior, and relevance while also allowing the creation and management of vector embeddings to enhance retrieval through semantic understanding.

Customers demand a vector index that can enable them to choose the best embedding models, optimize vector dimensions, and fine-tune retrieval accuracy, all while integrating smoothly with their knowledge bases and GenAI workflows. To address this, AWS introduced the Kendra GenAI Index.

Amazon Kendra GenAI Index provides retrieval abilities from more than 40 enterprise data sources
Amazon Kendra GenAI Index provides retrieval abilities from more than 40 enterprise data sources

The Kendra GenAI Index takes retrieval further, offering a fully managed solution designed specifically for RAG workflows and Bedrock. It integrates seamlessly with Bedrock Knowledge Bases and connects to enterprise sources like SharePoint, OneDrive, and Salesforce, enabling the reuse of indexed content across multiple use cases, including Amazon Q Business apps. With this, organizations can optimize embeddings, improve vector search performance, and enhance retrieval accuracy without the complexity of managing the underlying infrastructure.

Support for structured data retrieval#

RAG systems often need to interact with structured data residing in databases and data warehouses. Translating natural language queries into SQL (NL2SQL) is notoriously difficult. It involves customizing schema embeddings, running query analysis, performing data sampling and correction, and managing security concerns.

Generation of SQL queries with Amazon Bedrock
Generation of SQL queries with Amazon Bedrock

Bedrock’s support for structured data retrieval simplifies this process with a fully managed RAG solution. Using natural language lets you natively query your structured data (from SageMaker Lakehouse, Redshift, S3 tables, etc.). Bedrock automatically generates the necessary SQL queries, adapts to your schema and data, learns from query patterns, and offers customization options for enhanced accuracy. This eliminates the complex engineering required for manual NL2SQL implementation.

Amazon Bedrock Knowledge Bases support for GraphRAG#

Often, the information needed for a comprehensive response isn’t contained within a single document. RAG systems need to navigate relationships across multiple data sources. Knowledge graphs excel at this, representing connections between different pieces of information. However, building and maintaining knowledge graphs requires specialized expertise.

GraphRAG support for Amazon Bedrock Knowledge Bases
GraphRAG support for Amazon Bedrock Knowledge Bases

Bedrock simplified this by introducing GraphRAG support for Amazon Bedrock Knowledge Bases. It automatically creates graphs using Amazon Neptune, linking relationships between your data sources. This empowers customers to build more comprehensive GenAI applications without needing deep graph database expertise.

Amazon Bedrock data automation#

Extracting information from unstructured multimodal data is a complex undertaking. It involves a typical ETL (extract, transform, load) process.

Bedrock Data Automation simplifies this by automatically transforming unstructured multimodal data into structured data, ready to power your GenAI applications without writing any code. The service:

  • Extracts, transforms, and generates structured data from multimodal content.

  • Generates customized outputs based on the specified business rules.

  • Provides a streamlined, fully managed, single API experience.

Amazon Bedrock multimodal data ETL
Amazon Bedrock multimodal data ETL

This significantly reduces the effort required to prepare unstructured data for GenAI. By providing these powerful tools and services, AWS is lowering the barrier to entry for building data-driven generative AI applications.

And by simplifying the complexities of data preparation and retrieval, the services lets developers focus on innovation.

AWS’s commitment to responsible AI#

AWS places paramount importance on responsible AI development, recognizing that building trust in Generative AI requires proactive measures to mitigate potential risks.

This commitment is evident through the ongoing investments in tools and features designed to ensure AI systems’ ethical and safe deployment. AWS is actively addressing the unique challenges posed by generative AI, particularly in harmful content.

Amazon Bedrock Guardrails multimodal toxicity detection#

Amazon Bedrock Guardrails protects generative AI models from generating harmful text content, such as violence and insults. Recently, automated reasoning checks were introduced to implement checks on creating harmful text. However, as Generative AI expands to encompass multimodal content, the need for similar safeguards becomes even more critical. Recognizing this, AWS has announced multimodal toxicity detection within Amazon Bedrock Guardrails.

Multimodal toxicity detection in Amazon Bedrock
Multimodal toxicity detection in Amazon Bedrock

This new capability extends the protective shield of Guardrails to image content, providing configurable safeguards that enhance the security of multimodal generative AI applications.

  • It enables consistent policy control across text and image generation, ensuring a unified approach to responsible AI.

  • It is available for all foundation models in Amazon Bedrock that support image generation.

This proactive approach demonstrates AWS’s dedication to building trust and fostering responsible innovation in the rapidly evolving landscape of generative AI.

Amazon Q: A step toward agentic automation#

AWS believes the future of automation is agentic. They envision a world where intelligent agents seamlessly integrate into workflows, augmenting human capabilities and accelerating productivity.

While not a fully autonomous agent, Amazon Q represents AWS’s foray into intelligent automation. Powered by Amazon Bedrock’s advanced models, Amazon Q is designed to assist with software development and business analysis tasks. It bridges the agentic future by automating repetitive tasks, providing intelligent recommendations, and enabling users to focus on higher-value work.

This vision drives significant investment in developing and deploying powerful AI agents, exemplified by the advancements in Amazon Q agents.

Amazon Q Developer in Amazon SageMaker Canvas#

AWS recognizes the transformative potential of AI agents, particularly in software development. The Amazon Q Developer agent, specifically designed for software development tasks, has succeeded remarkably. It holds the top spot on the verified SWE bench leaderboard with a 52.8% software development problem-solving rate.

Beyond code generation, Q Developer could be used to revolutionize how ML models are built.

While tools like SageMaker Canvas have simplified the process, building ML models requires significant expertise in feature extraction, engineering, algorithm selection, training, and hyperparameter tuning. AWS further democratizes ML development by integrating Amazon Q Developer into SageMaker Canvas.

This integration provides a step-by-step guide, breaking down complex ML tasks into manageable steps. Q Developer assists with data preparation, problem definition, model building, evaluation, and deployment, making ML accessible to a wider audience, even those with limited ML experience.

Scenarios analysis capability of Amazon Q in QuickSight#

Analyzing complex business scenarios often requires writing manual SQL queries, processing data in spreadsheets, or building custom dashboards. But what if you could offload that complexity to an AI agent?

Amazon Q in QuickSight enables developers and analysts to perform deep, scenario-based data analysis using natural language. Instead of manually pulling data and building custom logic, Q automatically:

  • Finds relevant datasets across your data sources.

  • Suggests and executes analyses (e.g., forecasting, anomaly detection).

  • Optimizes query execution to speed up complex calculations.

For example, before launching DynamoDB, AWS needed to model free-tier usage scenarios. Traditionally, this would require custom SQL, scripts, and manual number crunching. With Amazon Q in QuickSight, teams can now automate this process—reducing analysis time by up to 10x.

For developers, this means less manual data wrangling, fewer ad-hoc queries, and more focus on building applications that drive insights.

AI at scale, built for developers#

Dr. Swami Sivasubramanian’s keynote in AWS re:Invent 2024 highlighted how AWS is making AI more accessible, scalable, and efficient—without compromising power.

From smarter SageMaker training to seamless Bedrock integrations and AI-driven insights in Amazon Q, these advancements aren’t just about building better tools—they’re about removing complexity so developers can focus on building, optimizing, and innovating.

Whether you're training models, integrating AI into applications, or optimizing for cost and performance, AWS is providing the building blocks to scale AI faster than ever.

What's next?

If these innovations inspire you and you want to start your journey into cloud computing and AI, AWS Certified Cloud Practitioner (CLF-C02) is the perfect first step.

Cover
Master AWS Certified Cloud Practitioner CLF-C02 Exam

AWS is one of the leading cloud service providers, offering various services to design secure, compliant, and cost-effective cloud solutions. This course will empower you to deeply understand AWS’s core services and practical applications. You’ll start by learning about the fundamentals of cloud computing. Next, you’ll learn about core AWS services like networking, storage, compute, and databases. You’ll also learn about AWS’s different analytics tools and machine learning services. From there, you’ll explore various AWS services for your organization’s pricing, budgeting, and billing optimization. You’ll learn about different tools for monitoring and auditing the cloud infrastructure to ensure security, optimize performance, and maintain compliance. Finally, you’ll get hands-on experience in various cloud services using Cloud Labs. After completing this course, you will be confident in becoming an AWS Certified Cloud Practitioner and pursuing entry-level roles in the industry.

20hrs
Beginner
31 Cloud Labs
27 Exercises

Written By:
Fahim ul Haq
Free Edition
AWS CodeCommit is back—and hopefully for good
AWS CodeCommit’s 2024 de-emphasis reshaped source control and CI/CD across AWS. With CodeCommit now back in GA and a clear roadmap, teams can reassess AWS-native workflows and pipeline design.
7 mins read
Feb 6, 2026