Foundation Models in Bedrock

Explore foundation models offered by Amazon Bedrock across text generation, embedding, image generation, and multimodal categories. Understand their differences, key trade-offs, and how to select the right model using a structured framework. Learn best practices for production use including model versioning and stability to build reliable generative AI applications.

We'll cover the following...

Text generation models
Embedding models and semantic search
- How embeddings work in Bedrock
  - Distance metrics for similarity
Image and multimodal models
Model selection framework
Model versioning and stability
Conclusion

Amazon Bedrock gives you access to a catalog of foundation models from multiple providers, all accessible through a single API. Your main architecture decision is less about hosting or training a model yourself and more about selecting the model that fits your workload’s latency, cost, modality, context window, and output-quality requirements. This lesson breaks down the foundation model families available in Bedrock, explains how they differ across critical dimensions, and equips you with a structured framework for making selection decisions that hold up in production.

A foundation modelA large-scale neural network pre-trained on broad, diverse datasets (text, code, images) that can be adapted to many downstream tasks without being retrained from scratch. differs fundamentally from a task-specific model. Traditional machine learning required training a separate model for each task, such as sentiment analysis, translation, or summarization. Foundation models collapse these into a single pre-trained system that you steer toward specific tasks through prompting, retrieval-augmented generation (RAG), or fine-tuning. Bedrock hosts these models behind a unified API, so your application code stays largely the same regardless of which provider you choose.

The models available in Bedrock fall into four categories that this lesson covers in depth.

Text generation models: These produce natural language output for tasks like summarization, classification, code generation, and conversational AI.
Embedding models: These convert text into numerical vectors that enable semantic search and retrieval pipelines.
Image generation models: These create visual content from text prompts or modify existing images.
Multimodal models: These accept mixed inputs such as images and text together, enabling visual question answering and document understanding.

Practical tip: Always exhaust prompt engineering and RAG before considering fine-tuning. This principle directly shapes which model capabilities matter most during selection, because strong instruction-following and large context windows often eliminate the need for customization.

The following sections examine each category and then bring them together in a decision framework you can apply to real projects.

Text generation models

Text generation is the most common use case in Bedrock, and the platform offers several model families with distinct trade-offs in cost, speed, and reasoning depth.

Amazon Nova family

Amazon Nova provides three tiers designed for different workload profiles. Nova Micro is the lightest option, optimized for the lowest cost per token and fastest response times. It handles simple classification, short-form generation, and structured extraction well, but lacks the reasoning depth needed for complex multi-step tasks. Nova Lite occupies the middle ground, balancing cost and capability for general-purpose workloads like customer support automation and content drafting. Nova Pro delivers the strongest reasoning and instruction-following in the Nova lineup, making it suitable for complex enterprise workflows that require nuanced output.

Anthropic Claude family

The Claude family from Anthropic is one of the most widely adopted model families in Bedrock. Haiku is optimized for speed and low cost, making it ideal for high-volume tasks where latency matters more than deep reasoning. Sonnet balances reasoning quality with response speed and serves as the default choice for most production applications. Opus provides the highest capability for complex multi-step reasoning and long-document analysis, with context windows reaching up to 200K tokens. That context window size means Opus can process entire books or lengthy legal contracts in a single request.

Meta Llama and Mistral

Meta Llama models are open-weight models hosted on Bedrock, offering strong general-purpose text performance. Teams that value transparency and the ability to inspect model weights find Llama particularly attractive. Mistral models deliver efficient performance with strong multilingual support and competitive pricing, making them a solid choice for latency-sensitive or budget-conscious deployments across multiple languages.

Bedrock’s unified API means switching between any of these providers requires minimal code changes. You can run the same prompt against Claude Sonnet and Nova Pro, compare outputs, and make a data-driven selection.

Note: Token pricing varies significantly across these models. For high-volume workloads processing millions of tokens daily, even a small per-token price difference compounds into meaningful cost differences over a billing cycle.

The following table summarizes the key trade-offs across text generation model families.

Model Family Comparison

Model Family	Model Tiers	Context Window	Relative Cost	Key Strengths	Best Use Cases
Amazon Nova	Micro, Lite, Pro	4K–128K tokens	Low	Cost efficiency and speed	Classification and general tasks
Anthropic Claude	Haiku, Sonnet, Opus	Up to 200K–1M tokens	Low to High	Reasoning depth and instruction following	Complex analysis and production chat
Meta Llama	—	Up to 128K tokens	Medium	Open weights and transparency	Teams needing inspectable models
Mistral	—	Up to 32K tokens	Low to Medium	Multilingual support and efficiency	Multilingual and latency-sensitive workloads

With text generation models covered, the next critical model category enables the retrieval systems that feed context into these generators.

Embedding models and semantic search

Embedding models serve a fundamentally different purpose than text generation models. Instead of producing language, they convert text into dense numerical vectors, which are fixed-dimensional arrays of floating-point numbers that capture semantic meaning. This enables similarity-based retrieval rather than keyword matching.

How embeddings work in Bedrock

Bedrock offers two primary embedding model families. Amazon Titan Embeddings supports multiple dimensionalities and integrates tightly with Bedrock Knowledge Bases, making it the default choice for most RAG pipelines built natively on AWS. Cohere Embed delivers strong multilingual embedding quality and flexible input types, which benefits teams working with documents in multiple languages. The following diagram illustrates how RAG works.

Understanding embeddings is essential for building retrieval systems, but Bedrock’s capabilities extend beyond text into visual content.

The choice of vector dimensionalityThe number of dimensions (floating-point values) in an embedding vector, such as 256, 512, or 1024. Higher dimensions capture more semantic nuance but increase storage and compute cost. is a direct trade-off between retrieval precision and infrastructure cost.

Distance metrics for similarity

When comparing vectors to find semantically similar content, three distance metrics are commonly used.

Cosine similarity: Measures the angle between two vectors, ignoring magnitude. This is the most common metric for text similarity because it focuses purely on directional alignment in the vector space.
Dot product: Measures magnitude-weighted similarity, which can be useful when vector length carries meaningful information.
Euclidean distance: Measures the straight-line distance between two points in vector space. This metric is less common for text but used in certain retrieval configurations.

Image and multimodal models

Bedrock includes several models that operate on visual content, either generating images from text or understanding images provided as input.

For image generation, three options stand out. Amazon Titan Image Generator produces images from text prompts and includes built-in watermarking for responsible AI, making it suitable for product mockups and marketing visuals. Stability AI (Stable Diffusion) offers diffusion-based generation with fine control over style and composition, favored in creative and design workflows. Amazon Nova Canvas is Amazon’s newer image generation model with enhanced prompt adherence and output quality. For video, Amazon Nova Reel enables short-form video content creation from text descriptions.

A critical distinction exists between models that generate images as output and models that understand images as input. Certain models, including Claude Sonnet, Nova Lite, and Nova Pro, accept images alongside text in their input. This multimodal input capability enables visual question answering, document understanding from scanned images, and chart interpretation.

Product image generation: E-commerce teams use Titan Image Generator or Nova Canvas to create product visuals at scale.
Visual QA for claims processing: Insurance workflows use multimodal input models to analyze photos of damage alongside structured questions.
Document understanding: Invoice extraction and form processing benefit from models that can read scanned documents and answer questions about their content.

The following quiz tests your understanding of model categories and their appropriate use cases.

With the model families established, the next step is applying a systematic approach to choosing between them.

Model selection framework

Selecting a foundation model should not be an ad hoc decision. A structured framework prevents teams from defaulting to the most expensive model or choosing based on brand familiarity alone.

Seven dimensions drive model selection in Bedrock:

Context window: This determines how much text the model can process in a single request. Claude Opus and Sonnet support up to 1 million tokens, which is essential for document summarization and long-form analysis. Smaller models may be limited to 4K through 32K tokens.
Latency: Time-to-first-token and total generation time vary significantly. Haiku and Nova Micro optimize for speed, while Opus and Pro prioritize output quality at the cost of slower responses.
Cost per token: Input and output token pricing differs across models. For high-volume workloads, a 2x price difference compounds into substantial monthly costs.
Reasoning capability: Complex multi-step tasks demand stronger models like Opus or Nova Pro. Simple extraction or classification tasks run efficiently on lighter models, and using a heavy model for simple tasks wastes budget.
Multimodal support: If the use case involves images, video, or mixed inputs, the eligible model pool narrows significantly to models like Claude Sonnet, Nova Lite, and Nova Pro.
Fine-tuning availability: Not all Bedrock models support fine-tuning. If customization beyond prompting and RAG is anticipated, verify that your chosen model supports it. Titan and Llama variants currently offer fine-tuning options.
Language support: Multilingual workloads favor Mistral or Cohere, which are specifically optimized for cross-language performance.

The practical approach is to start with the cheapest model that meets your minimum capability requirements. Use Bedrock’s model evaluation features to benchmark candidates against your specific prompts and data. Scale up to a more capable model only if output quality is demonstrably insufficient.

Practical tip: Regardless of which model you select, parameters like temperature, top_p, and max tokens are essential for tailoring behavior. A well-tuned prompt with appropriate inference parameters on a mid-tier model often outperforms a poorly configured prompt on a premium model.

The following visualization maps each selection dimension to specific model recommendations.

With a framework for choosing models in place, the final consideration is how to keep your selection stable in production.

Model versioning and stability

Every model in Bedrock carries a specific model IDA unique identifier that includes the provider, model name, and version string, such as anthropic.claude-3-sonnet-20240229-v1:0, ensuring you invoke an exact model version.. Bedrock also provides model aliases that point to a default or latest version, which can change when the provider releases an update.

This creates a critical production risk. If your application references a model alias rather than a pinned version ID, a provider update could change model behavior, output format, or quality without any code change on your side. Responses that passed your quality checks last week might degrade or shift in tone after an unannounced version rotation.

The best practices for managing this risk are straightforward.

Pin to specific model version IDs in production to ensure reproducibility and avoid breaking changes. Your infrastructure as code (IaC) templates and application configuration should reference the full version string.
Use model aliases only during development and experimentation where convenience outweighs stability requirements.
Test new versions explicitly against your evaluation suite before updating the pinned version in production. Treat model version changes with the same rigor as dependency upgrades.
Integrate model version tracking into your CI/CD pipeline so that version changes are deliberate, tested, and auditable through your standard deployment process.

Some model versions will be deprecated over time as providers release successors. Teams should monitor AWS announcements and plan migration windows proactively rather than reacting to forced deprecations.

Attention: A model alias update can silently change your application’s behavior in production. Always treat model version pinning as a non-negotiable production hygiene practice, similar to pinning library versions in your dependency files.

Conclusion

Amazon Bedrock provides access to multiple foundation model families spanning text generation, embeddings, image generation, and multimodal capabilities, all through a unified API that abstracts away infrastructure management. Text generation models range from lightweight and cost-efficient options like Nova Micro and Haiku to powerful reasoning engines like Claude Opus and Nova Pro. Embedding models such as Amazon Titan Embeddings and Cohere Embed are the backbone of semantic search and RAG pipelines, where consistency between indexing and query-time models is non-negotiable. A structured selection framework covering context window, latency, cost, reasoning, multimodal support, fine-tuning, and language support turns model selection from guesswork into an engineering decision. Finally, pinning to specific model version IDs in production protects your application from unplanned behavioral changes and keeps your outputs reproducible.

1.Introduction

2.Prompt Engineering and Model Selection

Cloud Lab

Cloud Lab

3.Customizing Models and Knowledge Retrieval

Cloud Lab

Cloud Lab

4.Building AI Agents with Amazon Bedrock

Cloud Lab

Cloud Lab

Cloud Lab

Cloud Lab

Cloud Lab

5.Integrating Bedrock with the AWS Ecosystem

Cloud Lab

Cloud Lab

Cloud Lab

6.Amazon Bedrock AgentCore and Production Agent Pipelines

Cloud Lab

7.Security and Responsible AI in Bedrock

Cloud Lab

Cloud Lab

8.Conclusion

Foundation Models in Bedrock

Text generation models

Amazon Nova family

Anthropic Claude family

Meta Llama and Mistral

Model Family Comparison

Embedding models and semantic search

How embeddings work in Bedrock

Distance metrics for similarity

Image and multimodal models

Model selection framework

Model versioning and stability

Conclusion