What Matters in the Age of AI: Text and Vector Search

Discover how to use Amazon DocumentDB's integrated text and vector search capabilities to perform keyword-based and meaning-based document retrieval. Learn the differences between text indexes and vector indexes, including HNSW and IVFFlat algorithms, and understand how these enable AI use cases such as semantic search, personalization, and retrieval-augmented generation within a single database cluster without external search engines.

We'll cover the following...

Native text search in DocumentDB
Vector search on DocumentDB 5.0
- Prerequisites and cluster compatibility
- The vector search workflow
HNSW vs. IVFFlat index types
- How HNSW works
- How IVFFlat works
AI-era use cases over operational data
Conclusion

In the previous lesson, you explored how DocumentDB’s global cluster model distributes reads and enables cross-region recovery. This lesson shifts focus inward to retrieval capabilities that run directly inside a single DocumentDB cluster. Modern applications increasingly need two distinct search patterns over the same operational data they already store: keyword-based retrieval, which finds documents containing specific terms, and meaning-based retrieval, which finds documents whose content is semantically closest to a user's intent. Historically, teams exported data from DocumentDB to a dedicated search engine such as Amazon OpenSearch Service to satisfy these patterns, adding ETL pipeline complexity, synchronization lag, and operational overhead. DocumentDB now offers native text search through the $search aggregation stage and native vector search through the $vectorSearch aggregation stage, allowing developers to query documents by keyword relevance or semantic similarity without leaving the database.

Before diving into mechanics, here are the key terms this lesson covers. Text indexes tokenize and stem string field values so the $search stage can match documents by keyword relevance. Vector indexes organize numerical embedding arrays so the $vectorSearch stage can retrieve documents by semantic similarity. The two supported vector index algorithms are HNSW (Hierarchical Navigable Small World)A graph-based approximate nearest-neighbor algorithm that builds a multilayer structure connecting vectors to their approximate neighbors for fast, high-recall retrieval. and IVFFlat (Inverted File with Flat quantization)A partition-based approximate nearest-neighbor algorithm that divides the vector space into clusters and searches only a subset of them at query time, trading recall for faster builds and lower memory use.. The concept of recall refers to the proportion of true nearest neighbors that the algorithm actually returns, and RAG (Retrieval-Augmented Generation) describes a pattern in which retrieved documents are fed as context to a large language model to produce grounded answers.

Attention: For exam scenarios, DocumentDB search is the correct answer when the requirement is integrated retrieval inside the operational document store. OpenSearch remains preferred when the scenario calls for broad search analytics, log search, or a dedicated search platform with dashboards and aggregations.

Understanding where each service fits is a recurring exam theme, so keep this boundary in mind as you work through the sections below.

Native text search in DocumentDB

Full-text search in DocumentDB begins with creating a text index on one or more string fields in a ...

1.Introduction

2.Common Foundation for All AWS Database Study

Cloud Lab

3.Amazon RDS

Cloud Lab

Cloud Lab

4.Amazon Aurora

Cloud Lab

5.Amazon DocumentDB

Cloud Lab

Cloud Lab

6.Amazon DynamoDB

Cloud Lab

Cloud Lab

7.Amazon ElastiCache

Cloud Lab

8.Amazon KeySpaces

Cloud Lab

9.Amazon MemoryDB

Cloud Lab

10.Amazon Neptune

Cloud Lab

11.Amazon Timestream

Cloud Lab

12.Conclusion

What Matters in the Age of AI: Text and Vector Search

Native text search in DocumentDB