Imagine hiring the world’s most brilliant consultant: an expert with decades of experience, but then providing them with no files, company history, or context about your business. You wouldn’t expect them to solve your most complex problems. They understand general knowledge but don’t know your products or customers.
Deploying generative AI without a data strategy is the same idea. The model can generate answers, but without access to your proprietary data, its responses are generic and often inaccurate. The AI cannot access company-specific knowledge, manuals, or documentation. Additionally, it misses out on short-term interactions, like ongoing conversations or session data, which are not stored.
The model’s insights lack business relevance as they can not use your data.
The real benefit of AI comes from the organization’s unique and proprietary data gives the model the context to generate useful responses. A well-architected data strategy is necessary for insightful and accurate AI outcomes.
This newsletter explores the architectural patterns and AWS services that form the pillars of a modern data strategy. It explains how to build a foundation that transforms generic models into context-aware applications that precisely understand your business.
A large language model (LLM) needs a carefully constructed and engineered prompt built from multiple layers of context to generate a truly accurate and relevant response, besides the user’s question. This process of providing rich, just-in-time information to the model is known as in-context learning.
An engineered prompt consists of three types of context:
Behavioral context (prompt template): This tells the model how to behave. It includes instructions on its persona, such as “you are a helpful customer support agent.” It also includes the desired output format, such as “Answer in three sentences or less.”
Situational context: This provides the model with short-term memory and user awareness. It includes the user’s conversation history, details about their account, and other real-time information that shapes the immediate interaction.
Semantic context (knowledge base): This is the model’s deep, long-term memory. It’s the vast repository of your organization’s specific knowledge, including technical documents, product manuals, legal policies, or past support tickets, that the model can draw upon to answer complex questions. This knowledge is typically stored and searched for in a vector database.
So, how do we architect a system to deliver this rich context?
There are three primary strategies for infusing your custom data into a GenAI application, each with its own complexity and use case:
Context engineering with RAG: The most common and often most effective approach. It involves retrieving relevant semantic context from a knowledge base and feeding it to a standard foundation model at inference time.
Training a model from scratch: This is the most resource-intensive method, where you train a model from the ground up on your own massive, proprietary dataset. It is reserved for highly specialized use cases.
Fine-tuning a foundation model: This involves further training a pretrained model on a smaller, curated dataset to adapt its style, tone, or understanding of a specific domain’s jargon.
For most enterprises, the journey begins with RAG and may evolve to include fine-tuning for more specialized tasks.
A simple user query triggers a complex data orchestration process behind the scenes. A well-architected RAG application fetches and integrates data from multiple specialized data stores to construct the engineered prompt in real-time.
Stored in prompt templates, the behavioral context can be efficiently managed as objects in Amazon S3 or a simple database.
For the dynamic situational context, we can choose among the options best suited to our needs.
Instant session recall (key-value): Amazon DynamoDB stores and retrieves the immediate history of a user’s current conversation. It is a key-value NoSQL database that provides extremely fast, single-digit millisecond performance for simple lookups with automatic scaling to handle high request volumes. In a GenAI application, you use the user’s session ID as the primary key. The value would be the recent conversation transcript. When a user sends a new message, the application can instantly retrieve the last few turns of the conversation from DynamoDB to understand the immediate context.
Complex conversational data (document): Amazon DocumentDB stores rich, multi-faceted information about a user’s conversation history and preferences. It is ideal when storing more than just a simple transcript. It uses a flexible JSON-like document model, allowing you to create a single, complex document per user. This document could contain their entire chat history and nested information like past support ticket IDs, product preferences they’ve mentioned, or language settings, all in one place.
In-session data (in-memory): Amazon MemoryDB for Redis provides ultra-low latency access to data that is needed instantly and repeatedly during a single session. It’s perfect for caching data that defines the immediate state of the user’s interaction, such as their shopping cart contents, authentication status, or temporary feature flags. This ensures the application feels incredibly responsive.
Relational data: Amazon Aurora can pull structured, transactional data from your core business systems to enrich the AI’s understanding. When a user asks: What’s the status of my recent order?, the application needs to query your system of record. Amazon Aurora, being a high-performance relational database, is perfect for this. It can be queried by the user’s ID to fetch reliable, structured data like order history, shipping status, or account details, providing the AI with factual business context.
Interconnected relationships (graph): Amazon Neptune understands complex, interconnected relationships between users and other data points. For truly advanced personalization or recommendations, you need to understand relationships. A graph database like Amazon Neptune excels at this. For example, to answer: Recommend a product, you could query Neptune to find products frequently bought with the user’s past purchases, items liked by friends in their social network, or their relationship to a specific loyalty program, providing a deep, 360-degree view that goes far beyond simple transaction history.
The semantic context is the core of RAG’s architecture, which requires a knowledge base. Amazon Bedrock Knowledge Bases offers the simplest, most integrated solution, automating the entire pipeline of creating vector embeddings from your documents in S3 and managing them in vector stores like Amazon RDS for PostgreSQL (with pgvector) or Amazon OpenSearch Service.
The magic behind modern semantic search is vector embeddings. In simple terms, they are numerical representations of meaning. An embedding model converts text into a list of numbers (a vector), where texts with similar meanings are mathematically closer. This allows us to search for meaning, a far more powerful tool than searching for keywords. Where to store and query these vectors is a critical architectural decision.
For organizations seeking a powerful, ML-based enterprise search without managing vectors, Amazon Kendra is an excellent alternative that indexes documents and uses natural language understanding to find answers directly. To optimize this entire flow, a caching layer using Amazon ElastiCache can reduce latency and database load.
The following table compares both services against the core concerns of an engineering team while choosing a vector database.
Feature | Amazon RDS for PostgreSQL (pgvector) | Amazon OpenSearch Serverless | Amazon S3 Vectors (via Knowledge Bases / Integrations) |
Familiarity | Ideal for teams already using PostgreSQL and comfortable with SQL. | Perfect for users with experience in Elasticsearch, Lucene, or NoSQL databases. | Very simple for teams already storing documents in S3; no DB expertise required. |
Ease of implementation | Simple to add the pgvector extension to an existing or new RDS instance. | Serverless offering removes the operational overhead of managing a cluster with built-in vector search APIs | Embeddings stored alongside documents in S3, automatically indexed by Bedrock Knowledge Bases or connectors. |
Scalability | Scales vertically and horizontally with read replicas, but is less elastic as compared to OpenSearch. | Designed for horizontal scalability, ideal for massive-scale applications with billions of embeddings. | Unlimited object storage, cost-effective for huge corpora. Embedding retrieval depends on the indexing layer, not S3 itself. |
Performance | Strong QPS and recall rate with proper indexing. | Typically, higher QPS at scale allows fine-tuning of recall. | Raw S3 is not optimized for vector search; performance depends on external index (e.g., Bedrock Knowledge Bases or custom pipelines). |
Flexibility | Allows JOINs between vector data and relational business data. | Excels at hybrid search (keyword + vector) for superior relevance. | Extremely flexible: store raw docs, embeddings, and metadata together. Works with multiple vector DBs or AI services as backends. |
Fine-tuning builds on top of a pretrained foundation model by adapting it to your organization’s specific needs such as tone, style, or industry-specific terminology.
We fine-tune a foundational model using the semantic, behavioural, and situational context. Fine-tuning allows us to adapt a large, pretrained model to a specific use case without needing massive amounts of compute or data. For example, in customer support, fine-tuning with domain-specific chat transcripts improves accuracy and relevance in responses.
For fine-tuning, the architecture is capable of handling large-scale data processing and training. The typical workflow looks like this:
Centralize data: Gather raw data, such as historical customer interactions or domain documents, into an Amazon S3 data lake or a data warehouse like Amazon Redshift.
Process and label: Use AWS Glue to clean, normalize, and transform this data into a high-quality labeled dataset.
Convert to JSONL format: Store the processed dataset back in S3 in JSON Lines (JSONL) format, where each line represents a prompt–completion pair.
Fine-tune the model: Launch a fine-tuning job in Amazon Bedrock or a custom training job in Amazon SageMaker, pointing it to the curated dataset stored in S3.
This approach emphasizes quality over quantity. A smaller but well-curated dataset often yields better results than massive but noisy data.
Pretraining a model from scratch involves training on large volumes of raw, unlabeled data (text, images, or multimodal) to learn fundamental language or representation patterns. Unlike fine-tuning, this requires significant compute power and optimized data pipelines.
Model training is a compute-bound process where powerful GPUs need data fed to them as fast as possible. While S3 is the perfect data lake for cost and scale, you need a high-performance file system for maximum training throughput. Amazon FSx for Lustre is designed for this. It can be linked directly to your S3 bucket, presenting your training datasets as a
As you develop more sophisticated generative AI applications, preventing data silos is essential for ensuring your models are both scalable and secure.
Fragmented data silos create security risks through inconsistent security policies, expanding the attack surface, With a a centralized governance strategy we can have a unified view of data access which allows us to consistently implement consistent policies across out applications.
This is where AWS Lake Formation helps. It provides granular, column-level security for your raw data in S3 to consistently safeguard sensitive information used in model training or retrieval-augmented generation (RAG).
Layered on top, Amazon DataZone creates a unified data catalog, which not only improves the accuracy of your GenAI's responses by enabling discovery of trusted datasets but also enhances security by providing clear visibility into data and usage, helping prevent the creation of redundant and potentially unsecured data copies.
Your data strategy must evolve to meet the demands of generative AI.
Be comprehensive: Your data stores must accommodate all data types, including structured, unstructured, raw, and vector embeddings. Remember that raw data is a valuable asset that can be reprocessed for future models.
Be integrated: Instead of disparate, siloed data sources, integrate your data into a central data lakehouse. Use ETL and Zero-ETL processes to make this data seamlessly available for any analytics or ML task, including generative AI.
Ultimately, your data strategy is your generative AI strategy. By building a robust, scalable, and well-governed data foundation on AWS, you move beyond the hype and begin creating truly intelligent applications.
Here are a few resources that can get you started with generative AI applications on AWS:
Building Generative AI Workflows with Amazon Bedrock: In this Cloud Lab, you’ll use AWS Bedrock Flows to build intelligent generative AI workflows, manage prompts, and deploy AI-driven agents for dynamic user interactions.
Building a RAG Chatbot Using LangChain and Amazon Bedrock: In this Cloud Lab, you’ll learn to create a RAG chatbot using Bedrock Knowledge Bases and base models. You’ll also explore utilizing these resources to build a LangChain chatbot.
Building Multiple Agents Using CrewAI and Bedrock: In this Cloud Lab, you’ll build CrewAI agents with Amazon Bedrock to future-proof your skills by creating Knowledge Bases, using foundational models, and integrating vector stores.