Turning documents into decisions with GraphRAG and Amazon Bedrock for smarter AI
Retrieval-augmented generation (RAG) transforms how organizations use large language models (LLMs). By combining LLMs with external data sources, RAG enables more accurate and contextually relevant answers than models can provide. Traditional RAG is helpful for simple queries that require pulling relevant text chunks. It relies on vector search from a vector database to retrieve semantically similar text chunks where the information can be found within a single or a few closely related documents. However, it struggles when queries require connecting facts spread across multiple sources or understanding complex relationships between entities and concepts.
This newsletter introduces GraphRAG, a graph-based approach designed to overcome these limitations.
The traditional RAG has significantly improved the handling of many standard queries, especially simpler questions where pulling relevant text chunks and conditioning a model works well. In a standard RAG pipeline, incoming queries are first converted into dense vector representations (embeddings). These query embeddings are then compared against embeddings of pre-indexed text chunks stored in a vector database. Using similarity search, the system retrieves the most relevant text segments. Because these segments typically come from a single document or a small set of closely related documents, the model can efficiently ground its responses in a highly relevant context before generating an answer.
Imagine a leading global financial institution struggling to extract deep insights from its massive proprietary research library. While traditional retrieval methods could handle simple queries, the results for complex questions that required connecting information scattered across multiple documents were often shallow answers lacking the context and synthesis needed for high-stakes decision-making.
This challenge isn’t unique. For enterprises with vast, complex datasets, traditional RAG often leads to “context fragmentation,” where key information is scattered across different parts of a document or even from several documents. Standard RAG struggles to provide a coherent answer in these situations. For example, a query requiring the synthesis of information from different parts of a document or multiple documents to answer a nuanced question may fail because the system’s probabilistic, vector-based approach struggles to perform multi-step or “multi-hop” reasoning. The clues for a solution are often scattered, and the relationships between them are hidden, making it difficult for the system to connect the dots and follow a logical chain of thought.
The image above shows how information is stored in a vector space. On the left, related pieces of information are close together, so traditional RAG can easily find and use them to answer a question. On the right, the information needed is scattered in different directions, making it harder for RAG’s one-time similarity search to collect all the relevant pieces. This scattering, called context fragmentation, is why traditional RAG struggles with aggregate reasoning, lacking the innate ability to synthesize information from disjointed text chunks to form a coherent, holistic understanding of a topic.
GraphRAG integrates knowledge graphs directly into the RAG process. Think of a knowledge graph as a map of our data: entities (people, places, concepts) and the relationships between them. Instead of just finding relevant text chunks, GraphRAG represents information as a network of interconnected entities, called nodes, and their relationships, called edges. This allows GraphRAG to go beyond simple keyword or semantic similarity searches. By mapping information as a network, it can understand complex relationships and perform “multi-hop reasoning,” which involves connecting information across multiple sources to answer a question. This significantly improves over traditional RAG methods that often struggle with scattered information.
Initially, GraphRAG performs a vector similarity search like a traditional RAG. However, it extends beyond this by traversing connected nodes to uncover related information. This represents a fundamental shift, adding a deterministic, relationship-aware traversal layer on top of vector retrieval. As a result, GraphRAG can follow relationship paths and reason across them rather than relying solely on probabilistic matches.
The technical distinction between GraphRAG and standard RAG translates into significant business value, particularly for high-stakes applications. The practical advantages are significant:
Explainability: One of the most critical benefits is enhanced explainability. The graph-based structure provides a clear lineage of how the system arrived at a given answer, allowing users to trace the connections and understand the provenance of the information. This ability to “show your work” is incredibly important in industries with strict rules and regulations. It’s not enough in finance to just say a transaction is fraudulent. We need to be able to show why. The graph can reveal the connections and patterns that led the system to flag it as high-risk, essential for audits and compliance. Similarly, for legal decisions, the ability to trace the information back to specific laws, court precedents, and documents is non-negotiable. It ensures the decision is grounded in facts and legal principles, not just an opaque AI guess.
Better accuracy: A graph provides a structured, verifiable framework for the AI’s answers. Instead of just guessing based on probability, the system can follow clear, defined relationships between pieces of information. Imagine a traditional AI is asked about the capital of France. It might get the right answer from its training data, but also get it wrong if it has seen conflicting information. With a graph-based system, the AI doesn’t have to guess. It follows a direct path: “Paris is a city” → “Paris is in France” → “Paris is the seat of government of France.” This creates a grounded and verifiable answer that is far less likely to be wrong or made-up (a phenomenon known as hallucination).
Cost savings: GraphRAG can summarize or synthesize information using fewer tokens by identifying key entities and relationships. Studies show it can cut token use by up to 97% for top-level summaries, delivering major cost and speed benefits for LLMs. This works because GraphRAG doesn’t send all the raw text or every relevant chunk into the prompt. Instead, it leverages graph structure and summaries to select only the most important information, removing redundancy and irrelevant material. As a result, fewer tokens are processed, which reduces computation, lowers API or model costs, and improves response times.
Stronger multi-step reasoning: In benchmark tests, GraphRAG has demonstrated superior performance on complex tasks, showing higher comprehensiveness and diversity in generated answers and, in some cases, up to 20% better performance in multi-hop question-answering tasks compared to standard RAG systems.
The following table provides a comparative analysis of the two approaches for quick reference on the key differences.
Traditional RAG | GraphRAG | |
Retrieval Mechanism | Dense vector search | Graph traversal and vector search |
Core Principle | Grounding responses in external text | Understanding relationships and context |
Ideal Query Type | Simple, single-hop, fact-based queries | Complex, multi-hop, multi-document queries |
Use Case | Customer support chatbots, document summarization | Financial market analysis, legal research, medical diagnostics, supply chain optimization, and knowledge management |
Performance Strengths | Lower latency, high throughput on simple tasks | Higher accuracy and comprehensiveness on complex tasks, reduced hallucinations |
Explainability | Limited; depends on retrieved text | High; verifiable via graph structure |
Building a GraphRAG system from scratch can seem complicated, but AWS offers tools that make it much more manageable.
Open-source GraphRAG Toolkit: A Python framework that simplifies creating knowledge graphs from unstructured data by automating graph building, supporting multiple graph and vector stores (such as Amazon Neptune Database, Neptune Analytics, or Amazon OpenSearch Serverless), and integrating with foundation models hosted in Amazon Bedrock.
Amazon Bedrock: A fully managed service for foundation models, infrastructure, and scaling, enhances this approach with built-in capabilities like Knowledge Bases with GraphRAG, which automatically ingests documents (e.g., from S3), extracts entities and relationships, creates graph and vector stores, manages retrieval and graph traversal, and supports prompt augmentation and source attribution. Amazon Bedrock Knowledge Bases act as the control plane, orchestrating the RAG process and automating graph creation and traversal parts. This service eliminates the need for organizations to build and manage complex, custom integrations between data sources, embedding models, and vector stores.
Amazon Neptune is a fully managed graph database service, optimized for high-performance querying of relationships and patterns in your data. It supports multiple graph models (property graph and RDF) and integrates seamlessly with AWS services, making it a natural fit for storing and traversing knowledge graphs that power RAG pipelines or recommendation engines.
Transforming unstructured data into a functional knowledge graph is a sophisticated, multi-step pipeline that Amazon Bedrock automates. The workflow begins with unstructured documents, such as PDFs, stored in an Amazon S3 bucket. The ingestion pipeline starts by splitting these documents into manageable text chunks using customizable methods, ranging from basic fixed-size chunking to advanced LLM-based mechanisms tailored to content structure.
Following this, an LLM-based ExtractChunkEntity step is executed. This process identifies and extracts key entities from each chunk of text. The extracted information (including the chunk text, its embedding, the document ID, and the newly identified entities) is then sent to Amazon Neptune Analytics. During this insertion process, Neptune creates three distinct types of nodes:
Chunk nodes: Representing individual text fragments.
Document nodes: Linking chunks back to their source files.
Entity nodes: Representing key extracted concepts.
Using the bulk load API, Neptune automatically creates the interconnected nodes and edges that link the chunks to their source documents and the extracted entities. This automated process creates a comprehensive knowledge graph that provides a structured, relational view of the underlying data. This architectural approach moves beyond simple similarity-based retrieval toward an intelligent system that understands and leverages the structural relationships within the data.
Amazon Neptune sits at the heart of this workflow as AWS’s fully managed graph database service, optimized for high-performance querying of relationships and patterns. Organizations can implement scalable, production-grade knowledge graphs beyond basic keyword or semantic search by combining Bedrock’s orchestration and LLM capabilities with Neptune’s graph storage and traversal power.
An iterative optimization approach is recommended to fine-tune a GraphRAG system for optimal performance. This means going beyond default configurations to implement advanced retrieval strategies that maximize precision, relevance, and efficiency.
A critical first step is data preparation. Using advanced chunking techniques ensures documents are broken into meaningful units for retrieval. For example:
Semantic chunking is ideal for documents without clear contextual boundaries.
Hierarchical chunking works well for complex or nested documents.
Note: Although these strategies can significantly improve precision, they may incur additional costs. For full control, organizations can deploy a custom AWS Lambda function to manage the chunking process.
Beyond chunking, additional best practices for GraphRAG optimization include:
Metadata filtering: Utilize document metadata (e.g., year, author, genre) to refine search results and improve response relevance. This is especially valuable for time-sensitive or domain-specific queries.
Hybrid search: Combine dense vector search with sparse keyword search to leverage the strengths of both approaches and deliver more comprehensive results.
Reranking models: Apply reranker models to reorder retrieved contexts based on query relevance, significantly boosting precision and accuracy.
Query expansion: Break down complex queries into more targeted sub-queries (query decomposition). This allows Amazon Bedrock to run multiple, focused searches across the knowledge base, producing richer, more complete responses.
Finally, while vector stores provide powerful capabilities, they can be expensive and challenging to manage. Amazon S3 Vectors offers a cost-effective alternative, reducing vector storage costs by up to 90%. It provides durable, scalable storage for large, long-term workloads with less frequent queries and can tolerate slightly higher latency than millisecond-level vector databases.
The superior capabilities of GraphRAG are not merely theoretical; they deliver tangible business outcomes across various industries. The following case studies illustrate how organizations leverage this technology to solve high-stakes, high-value problems.
A leading global financial institution, seeking to enhance insight extraction from its vast repository of proprietary research, partnered with AWS to build a proof-of-concept using Amazon Bedrock Knowledge Bases and Amazon Neptune Analytics. The institution aimed to answer complex questions requiring multi-hop reasoning, such as “What are some headwinds and tailwinds to capex growth in the next few years?”. While traditional retrieval methods delivered straightforward responses that often lacked deeper context, GraphRAG provided more nuanced and comprehensive answers by tracing relationships between economic indicators, policy changes, and industry impacts. This ability to synthesize cross-document information proved crucial for accelerating data-driven business decisions.
An international auto company faced a persistent challenge in ensuring that insights were accurate and interconnected across its massive dataset, which supported thousands of use cases. The company prototyped a graph that mapped relationships between key data points like vehicle performance, supply chain logistics, and customer feedback. With Amazon Bedrock automatically constructing the graph from ingested documents, the company could more efficiently surface relevant insights and identify patterns in manufacturing quality and supply chain resilience. This prevented the organization from relying on disconnected query results, making data analysis more effective and scalable.
The applications of GraphRAG extend beyond finance and manufacturing. In health care, it is used for medical diagnostics by connecting symptoms, diseases, and treatments to suggest personalized care options. An example is Precina Health, which leveraged GraphRAG to systematize diabetes management, leading to a 1% monthly drop in Hemoglobin A1C (HbA1C) across patients by tracing complex cause-and-effect relationships.
Similarly, in knowledge management, NASA built a “People Knowledge Graph” to navigate its deep institutional knowledge and identify in-house experts for complex workforce queries, a task that a vector-only RAG system would struggle to accomplish accurately. GraphRAG redefines case preparation in legal tech by navigating complex legal precedents, statutes, and case law with depth and precision, providing a significant advantage over flat keyword searches
GraphRAG shifts from simple keyword-based retrieval to an advanced, context-aware approach by combining knowledge graphs with LLMs. This enables multi-hop reasoning, verifiable and explainable answers, and improved accuracy on complex, multi-source queries.
Organizations can accelerate adoption by leveraging AWS managed services like Amazon Bedrock Knowledge Bases and Amazon Neptune Analytics. These services offer a scalable, cost-effective, fully managed solution without the burden of building and maintaining graph databases.
Enterprises seeking to move beyond basic Q&A to deeper insights should start with a high-value, complex use case, build a proof-of-concept on AWS, and then scale strategically using the advanced techniques discussed. This approach positions organizations for a more intelligent, comprehensive, and trustworthy future in enterprise AI.
Explore more Amazon Bedrock possibilities with the following Cloud Labs: