AWS Generative AI Landscape

Explore the AWS generative AI landscape by understanding Bedrock's role as a managed service for foundation models. Learn how to navigate its core capabilities such as model invocation, RAG, agents, and fine-tuning. Gain insight into access models like on-demand and provisioned throughput, and distinguish Bedrock from SageMaker to design optimal AI solutions using multiple foundation model providers.

We'll cover the following...

The three-layer AWS generative AI stack
Bedrock core capabilities
On-demand vs. provisioned throughput
- On-demand access
- Provisioned throughput
Bedrock vs. SageMaker
Supported model providers on Bedrock
Conclusion

Now that you understand generative AI architectures, large language models, and the risks that come with them, the next lesson turns to Amazon Bedrock, the AWS-managed service for building production generative AI applications with foundation models.

Amazon Bedrock is a fully managed service that provides API access to high-performing foundation models from Amazon and leading AI companies through a single, unified interface. The word “managed” carries specific weight here. AWS handles model hosting, infrastructure scaling, security patching, and compliance certifications on your behalf, so developers can focus on application logic rather than infrastructure code.

Consider the alternative. Without Bedrock, integrating a foundation model would require provisioning EC2 GPU instances, configuring container runtimes, deploying model artifacts, building autoscaling policies, and managing endpoint health. Bedrock abstracts all of that into a single API call. Think of it like the difference between managing your own mail server and using a managed email service. The outcome is the same, but the operational burden is dramatically different.

This lesson covers five areas that form the architectural orientation for everything that follows. You will understand the three-layer AWS generative AI stack, survey Bedrock’s core capabilities, compare on-demand and provisioned throughput access models, distinguish Bedrock from SageMaker, and review the major foundation model providers available through the service.

The three-layer AWS generative AI stack

AWS organizes its generative AI offerings into three distinct layers. Understanding where each layer begins and ends is essential for making correct architectural decisions, because choosing the wrong layer means either over-engineering a simple problem or under-powering a complex one.

The following breakdown describes each layer and the teams it serves.

Infrastructure layer: This bottom layer includes AWS custom AI chips, Trainium for cost-efficient model training, and Inferentia for high-throughput inference, alongside GPU-based EC2 instances such as P5 and P4d. Teams that need full control over training loops, custom model architectures, or specialized serving configurations operate here.
Model layer: Amazon Bedrock sits at this layer, providing managed access to pre-built foundation models without requiring any infrastructure management. This is the sweet spot for most application builders who want to consume foundation model capabilities rather than build them from scratch.
Application layer: Higher-level AWS services and features that consume Bedrock live here, including Amazon Q and purpose-built integrations embedded in other AWS services. These are ready-made solutions for common use cases.

Choosing the right layer depends on your team’s expertise, customization needs, and tolerance for operational overhead. Most developers and solutions architects will operate at the model layer or application layer.

The following diagram illustrates how these three layers relate to each other:

With this layered mental model in place, the next step is understanding what Bedrock actually offers at the model layer.

Bedrock core capabilities

Bedrock is not a single feature. It is a platform with several integrated capabilities that map to different stages of the generative AI application life cycle. The following survey establishes a mental map that the rest of this course will fill in detail. Here are some key features that are offered by Bedrock:

Model invocation: Developers call foundation models through a unified API for text generation, summarization, image generation, and embeddings. One API structure works across all supported providers.
Knowledge Bases (RAG): Bedrock provides a managed retrieval-augmented generation (RAG)A technique that retrieves relevant documents from external data sources and injects them into the model's prompt context, grounding responses in domain-specific information rather than relying solely on the model's training data. implementation that connects models to external data sources such as Amazon S3 buckets and web crawlers. This directly addresses hallucination risk by grounding responses in your actual data.
Agents: These allow models to orchestrate multi-step tasks by calling APIs and tools autonomously, enabling workflows that go beyond single-turn question answering.
Guardrails: Configurable content filters, denied topic lists, and PII redaction policies enforce responsible AI at the API level. This capability directly mitigates the risks of prompt injection, bias, and harmful content generation covered previously.
Fine-tuning: When prompting and RAG are insufficient, Bedrock allows you to customize some foundation models with proprietary training data.
Model evaluation: Benchmark and compare models on accuracy, robustness, and toxicity before deploying to production, enabling data-driven model selection.

Practical tip: The priority order for customization should be prompting first, then RAG, then agents, and finally fine-tuning. Each step adds complexity and cost, so exhaust simpler approaches before escalating.

The following visualization maps these capabilities and their key components.

Now that you’ve seen the main features, the next decision is choosing how to access these capabilities based on cost, latency, throughput, and workload patterns.

On-demand vs. provisioned throughput

Bedrock offers two distinct access models, and choosing the wrong one can result in either unnecessary cost or unacceptable latency in production.

On-demand access

With on-demand access, you pay per input and output token with no upfront commitment. AWS manages scaling automatically behind the scenes. This model is ideal for variable or unpredictable workloads, development and prototyping phases, and cost-sensitive experimentation where you want to avoid any financial commitment.

Provisioned throughput

Provisioned throughputA capacity reservation model in Bedrock where you purchase dedicated model units for a committed term (1 month or 6 months), guaranteeing consistent inference latency and higher request throughput regardless of platform-wide demand. serves production workloads with steady or predictable traffic. It is the right choice for latency-sensitive applications like real-time customer chat and for workloads where cost predictability matters more than flexibility. Provisioned throughput is also required for deploying certain fine-tuned models.

Attention: Jumping to provisioned throughput during early development locks you into costs before you understand your actual traffic patterns. Start with on-demand, load-test your application, and then commit.

The key decision factors are traffic pattern (bursty vs. steady), latency requirements, and budget model. The following table summarizes the trade-offs:

With pricing and access models understood, the next common source of confusion is when to use Bedrock vs. SageMaker.

Bedrock vs. SageMaker

These two services serve fundamentally different purposes, and conflating them leads to architectural mistakes.

Amazon Bedrock is designed for consuming and lightly customizing pre-built foundation models. The builder writes application code, not training code. The typical request flow involves user input processed through a prompt template, routed to a Bedrock foundation model, optionally augmented through Knowledge Base retrieval, filtered through Guardrails, and returned to the client.

Amazon SageMaker is designed for teams that need to train custom models from scratch, run advanced fine-tuning with full control over hyperparameters and infrastructure, or deploy non-foundation-model ML workloads such as XGBoost for tabular prediction or custom computer vision models.

The decision framework is straightforward. If the task is “use a foundation model through API with optional RAG or guardrails,” choose Bedrock. If the task is “train a proprietary model on custom data with full infrastructure control,” choose SageMaker. These services are complementary, not competing. A production architecture might use SageMaker Pipelines for data preprocessing and Bedrock for inference.

Note: Bedrock fine-tuning is intentionally limited in scope compared to SageMaker. If you need control over learning rates, training epochs, or custom loss functions, SageMaker is the appropriate tool.

The following quiz tests your understanding of the access models, service boundaries, and capability mapping covered so far.

With the service boundaries clear, the final piece of the orientation is understanding which foundation models are actually available through Bedrock.

Supported model providers on Bedrock

Bedrock provides access to foundation models from multiple providers, each with distinct strengths. The unified API means switching between providers requires minimal code changes. The call structure remains consistent, which enables rapid experimentation. Here are some of the model families supported by Bedrock:

Amazon Titan and Nova: Amazon’s own models cover text generation, embeddings, and multimodal capabilities. They are tightly integrated with other AWS services and competitively priced, making them a natural starting point for AWS-native architectures.
Anthropic Claude: Known for strong reasoning, long context windows (up to 200K tokens), and safety-focused design. Claude is a top choice for complex enterprise tasks that require nuanced understanding of lengthy documents.
Meta Llama: Open-weight modelsFoundation models whose trained parameters are publicly released, allowing inspection, modification, and self-hosting, unlike proprietary models where weights remain closed. from Meta offer flexibility and strong performance across text tasks, with Bedrock providing managed hosting so you avoid the operational burden of self-deployment.
Mistral: Efficient, high-performance models with strong multilingual capabilities and competitive cost-to-performance ratios, making them attractive for latency-sensitive or budget-conscious workloads.
Stability AI: Specializes in image and visual content generation through diffusion modelsA class of generative models that learn to create data (typically images) by iteratively removing noise from a random input, guided by text or image prompts., serving use cases in creative content and design.

The growing ecosystem means new providers and models are added regularly. Bedrock’s Model Evaluation feature lets you benchmark candidates against each other before committing to one in production.

Conclusion

Amazon Bedrock is a fully managed service that handles the underlying model infrastructure so developers can focus on building applications. In the AWS generative AI stack, Bedrock sits in the foundation model layer, between lower-level compute infrastructure and higher-level AI application services. Its core capabilities include model invocation, Knowledge Bases for Amazon Bedrock, Agents for Amazon Bedrock, Guardrails for Amazon Bedrock, fine-tuning, and model evaluation. These capabilities define the main areas this course explores in detail. On-demand access suits prototyping and variable workloads, while provisioned throughput serves production systems needing consistent latency. Bedrock is for consuming foundation models; SageMaker is for training custom models with full infrastructure control. Multiple providers are accessible through a single unified API, enabling rapid experimentation without architectural rework. Now that you understand where Bedrock fits, the next lesson examines the foundation models available in Amazon Bedrock and how to evaluate and select models based on workload requirements such as latency, cost, modality, context window, and output quality.

Feature	On-Demand	Provisioned Throughput
Pricing Model	Pay-per-token	Reserved model units (hourly rate)
Commitment	None	1-month or 6-month term
Scaling	Automatic, managed by AWS	Fixed reserved capacity
Latency	Variable under load	Consistent and predictable
Best For	Prototyping, variable traffic, experimentation	Production, steady traffic, latency-sensitive apps
Fine-Tuned Models	Not always supported	Required for custom model deployment

1.Introduction

2.Prompt Engineering and Model Selection

Cloud Lab

Cloud Lab

3.Customizing Models and Knowledge Retrieval

Cloud Lab

Cloud Lab

4.Building AI Agents with Amazon Bedrock

Cloud Lab

Cloud Lab

Cloud Lab

Cloud Lab

Cloud Lab

5.Integrating Bedrock with the AWS Ecosystem

Cloud Lab

Cloud Lab

Cloud Lab

6.Amazon Bedrock AgentCore and Production Agent Pipelines

Cloud Lab

7.Security and Responsible AI in Bedrock

Cloud Lab

Cloud Lab

8.Conclusion

AWS Generative AI Landscape

The three-layer AWS generative AI stack

Bedrock core capabilities

On-demand vs. provisioned throughput

On-demand access

Provisioned throughput

On-Demand vs. Provisioned Throughput Comparison

Bedrock vs. SageMaker

Supported model providers on Bedrock

Conclusion