Search⌘ K
AI Features

Foundation Models in Bedrock

Explore foundation models offered by Amazon Bedrock across text generation, embedding, image generation, and multimodal categories. Understand their differences, key trade-offs, and how to select the right model using a structured framework. Learn best practices for production use including model versioning and stability to build reliable generative AI applications.

Amazon Bedrock gives you access to a catalog of foundation models from multiple providers, all accessible through a single API. Your main architecture decision is less about hosting or training a model yourself and more about selecting the model that fits your workload’s latency, cost, modality, context window, and output-quality requirements. This lesson breaks down the foundation model families available in Bedrock, explains how they differ across critical dimensions, and equips you with a structured framework for making selection decisions that hold up in production.

A foundation modelA large-scale neural network pre-trained on broad, diverse datasets (text, code, images) that can be adapted to many downstream tasks without being retrained from scratch. differs fundamentally from a task-specific model. Traditional machine learning required training a separate model for each task, such as sentiment analysis, translation, or summarization. Foundation models collapse these into a single pre-trained system that you steer toward specific tasks through prompting, retrieval-augmented generation (RAG), or fine-tuning. Bedrock hosts these models behind a unified API, so your application code stays largely the same regardless of which provider you choose.

The models available in Bedrock fall into four categories that this lesson covers in depth.

  • Text generation models: These produce natural language output for tasks like summarization, classification, code generation, and conversational AI.

  • Embedding models: These convert text into numerical vectors that enable semantic search and retrieval pipelines.

  • Image generation models: These create visual content from text prompts or modify existing images.

  • Multimodal models: These accept mixed inputs such as images and text together, enabling visual question answering and document understanding.

Practical tip: Always exhaust prompt engineering and RAG before considering fine-tuning. This principle directly shapes which model capabilities matter most during selection, because strong instruction-following and large context windows often eliminate the need for customization.

The following sections examine each category and then bring them together in a decision framework you can apply to real projects.

Text generation models

Text generation is the most common use case in Bedrock, and the platform offers several model families with distinct trade-offs in cost, speed, and reasoning depth.

Amazon Nova family

Amazon Nova provides three tiers designed for different workload profiles. Nova Micro is the lightest option, optimized for the lowest cost per token and fastest response times. It handles simple classification, short-form generation, and structured extraction well, but lacks the reasoning depth needed for complex multi-step tasks. Nova Lite occupies the middle ground, balancing cost and capability for general-purpose workloads like customer support automation and content drafting. Nova Pro delivers the strongest reasoning and instruction-following in the Nova lineup, making it suitable for complex enterprise workflows that require nuanced output.

Anthropic Claude family

The Claude family from Anthropic is one of the most widely adopted model families in Bedrock. Haiku is optimized for speed and low cost, making it ideal for high-volume tasks where latency matters more than deep reasoning. Sonnet balances reasoning quality with response speed and serves as the default choice for most production applications. Opus provides the highest capability for complex multi-step reasoning and long-document analysis, with context windows reaching up to 200K tokens. That context window size means Opus can process entire books or lengthy legal contracts in a single request.

Meta Llama and Mistral

Meta Llama models are open-weight models hosted on Bedrock, offering strong general-purpose text performance. Teams that value transparency and the ability to inspect model weights find Llama particularly attractive. Mistral models deliver efficient performance with strong multilingual support and competitive pricing, making them a solid choice for latency-sensitive or budget-conscious deployments across multiple languages.

Bedrock’s unified API means switching between any of these providers requires minimal code changes. You can run the same prompt against Claude Sonnet and Nova Pro, compare outputs, and make a data-driven selection.

Note: Token pricing varies significantly across these models. For high-volume workloads processing millions of tokens daily, even a small per-token price difference compounds into meaningful cost differences over a billing cycle.

The following table summarizes the key trade-offs across text generation model families.

Model Family Comparison

Model Family

Model Tiers

Context Window

Relative Cost

Key Strengths

Best Use Cases

Amazon Nova

Micro, Lite, Pro

4K–128K tokens

Low

Cost efficiency and speed

Classification and general tasks

Anthropic Claude

Haiku, Sonnet, Opus

Up to 200K–1M tokens

Low to High

Reasoning depth and instruction following

Complex analysis and production chat

Meta Llama

Up to 128K tokens

Medium

Open weights and transparency

Teams needing inspectable models

Mistral

Up to 32K tokens

Low to Medium

Multilingual support and efficiency

Multilingual and latency-sensitive workloads

With text generation models covered, the next critical model category enables the retrieval systems that feed context into these generators.

Embedding models and semantic search

Embedding models serve a fundamentally different purpose than text generation models. Instead of producing language, they convert text into dense numerical vectors, which are fixed-dimensional arrays of floating-point numbers that capture semantic meaning. This enables similarity-based retrieval rather than keyword matching.

How embeddings work in Bedrock

Bedrock offers two primary embedding model families. Amazon Titan Embeddings supports multiple dimensionalities and integrates tightly with Bedrock Knowledge Bases, making it the default choice for most RAG pipelines built natively on AWS. Cohere Embed delivers strong multilingual embedding quality and flexible input types, which benefits teams working with documents in multiple languages. The following diagram illustrates how RAG works.

How RAG works
How RAG works

Understanding embeddings is essential for building retrieval systems, but Bedrock’s capabilities extend beyond text into visual content.

The choice of vector dimensionalityThe number of dimensions (floating-point values) in an embedding vector, such as 256, 512, or 1024. Higher dimensions capture more semantic nuance but increase storage and compute cost. is a direct trade-off between retrieval precision and infrastructure cost.

Distance metrics for similarity

When comparing vectors to find semantically similar content, three distance metrics are commonly used.

  • Cosine similarity: Measures the angle between two vectors, ignoring magnitude. This is the most common metric for text similarity because it focuses purely on directional alignment in the vector space.

  • Dot product: Measures magnitude-weighted similarity, which can be useful when vector length carries meaningful information.

  • Euclidean distance: Measures the straight-line distance between two points in vector space. This metric is less common for text but used in certain retrieval configurations.

Image and multimodal models

Bedrock includes several models that operate on visual content, either generating images from text or understanding images provided as input.

For image generation, three options stand out. Amazon Titan Image Generator produces images from text prompts and includes built-in watermarking for responsible AI, making it suitable for product mockups and marketing visuals. Stability AI (Stable Diffusion) offers diffusion-based generation with fine control over style and composition, favored in creative and design workflows. Amazon Nova Canvas is Amazon’s newer image generation model with enhanced prompt adherence and output quality. For video, Amazon Nova Reel enables short-form video content creation from text descriptions.

A critical distinction exists between models that generate images as output and models that understand images as input. Certain models, including Claude Sonnet, Nova Lite, and Nova Pro, accept images alongside text in their input. This multimodal input capability enables visual question answering, document understanding from scanned images, and chart interpretation.

  • Product image generation: E-commerce teams use Titan Image Generator or Nova Canvas to create product visuals at scale.

  • Visual QA for claims processing: Insurance workflows use multimodal input models to analyze photos of damage alongside structured questions.

  • Document understanding: Invoice extraction and form processing benefit from models that can read scanned documents and answer questions about their content.

The following quiz tests your understanding of model categories and their appropriate use cases.

Lesson Quiz

1.

A team is building semantic search over 500,000 documents. Which model type should they use?

A.

Text generation model

B.

Embedding model (Amazon Titan Embeddings)

C.

Image generation model

D.

Multimodal model


1 / 3

With the model families established, the next step is applying a systematic approach to choosing between them.

Model selection framework

Selecting a foundation model should not be an ad hoc decision. A structured framework prevents teams from defaulting to the most expensive model or choosing based on brand familiarity alone.

Seven dimensions drive model selection in Bedrock:

  • Context window: This determines how much text the model can process in a single request. Claude Opus and Sonnet support up to 1 million tokens, which is essential for document summarization and long-form analysis. Smaller models may be limited to 4K through 32K tokens.

  • Latency: Time-to-first-token and total generation time vary significantly. Haiku and Nova Micro optimize for speed, while Opus and Pro prioritize output quality at the cost of slower responses.

  • Cost per token: Input and output token pricing differs across models. For high-volume workloads, a 2x price difference compounds into substantial monthly costs.

  • Reasoning capability: Complex multi-step tasks demand stronger models like Opus or Nova Pro. Simple extraction or classification tasks run efficiently on lighter models, and using a heavy model for simple tasks wastes budget.

  • Multimodal support: If the use case involves images, video, or mixed inputs, the eligible model pool narrows significantly to models like Claude Sonnet, Nova Lite, and Nova Pro.

  • Fine-tuning availability: Not all Bedrock models support fine-tuning. If customization beyond prompting and RAG is anticipated, verify that your chosen model supports it. Titan and Llama variants currently offer fine-tuning options.

  • Language support: Multilingual workloads favor Mistral or Cohere, which are specifically optimized for cross-language performance.

The practical approach is to start with the cheapest model that meets your minimum capability requirements. Use Bedrock’s model evaluation features to benchmark candidates against your specific prompts and data. Scale up to a more capable model only if output quality is demonstrably insufficient.

Practical tip: Regardless of which model you select, parameters like temperature, top_p, and max tokens are essential for tailoring behavior. A well-tuned prompt with appropriate inference parameters on a mid-tier model often outperforms a poorly configured prompt on a premium model.

The following visualization maps each selection dimension to specific model recommendations.

Framework for mapping workload requirements to specific model families across seven measurable dimensions

With a framework for choosing models in place, the final consideration is how to keep your selection stable in production.

Model versioning and stability

Every model in Bedrock carries a specific model IDA unique identifier that includes the provider, model name, and version string, such as anthropic.claude-3-sonnet-20240229-v1:0, ensuring you invoke an exact model version.. Bedrock also provides model aliases that point to a default or latest version, which can change when the provider releases an update.

This creates a critical production risk. If your application references a model alias rather than a pinned version ID, a provider update could change model behavior, output format, or quality without any code change on your side. Responses that passed your quality checks last week might degrade or shift in tone after an unannounced version rotation.

The best practices for managing this risk are straightforward.

  • Pin to specific model version IDs in production to ensure reproducibility and avoid breaking changes. Your infrastructure as code (IaC) templates and application configuration should reference the full version string.

  • Use model aliases only during development and experimentation where convenience outweighs stability requirements.

  • Test new versions explicitly against your evaluation suite before updating the pinned version in production. Treat model version changes with the same rigor as dependency upgrades.

  • Integrate model version tracking into your CI/CD pipeline so that version changes are deliberate, tested, and auditable through your standard deployment process.

Some model versions will be deprecated over time as providers release successors. Teams should monitor AWS announcements and plan migration windows proactively rather than reacting to forced deprecations.

Attention: A model alias update can silently change your application’s behavior in production. Always treat model version pinning as a non-negotiable production hygiene practice, similar to pinning library versions in your dependency files.

Conclusion

Amazon Bedrock provides access to multiple foundation model families spanning text generation, embeddings, image generation, and multimodal capabilities, all through a unified API that abstracts away infrastructure management. Text generation models range from lightweight and cost-efficient options like Nova Micro and Haiku to powerful reasoning engines like Claude Opus and Nova Pro. Embedding models such as Amazon Titan Embeddings and Cohere Embed are the backbone of semantic search and RAG pipelines, where consistency between indexing and query-time models is non-negotiable. A structured selection framework covering context window, latency, cost, reasoning, multimodal support, fine-tuning, and language support turns model selection from guesswork into an engineering decision. Finally, pinning to specific model version IDs in production protects your application from unplanned behavioral changes and keeps your outputs reproducible.