What Matters in the Age of AI: Text and Vector Search
Explore how Amazon DocumentDB provides integrated native text and vector search capabilities on a single cluster, enabling applications to perform keyword-based and meaning-based retrieval of documents without external search systems. Understand how to create text and vector indexes and how their algorithms HNSW and IVFFlat balance recall, latency, and resource use. Learn practical AI-era use cases like semantic search, personalization, and retrieval-augmented generation within DocumentDB to simplify architecture and improve relevance.
In the previous lesson, you explored how DocumentDB's global cluster model distributes reads and enables cross-Region recovery. This lesson shifts focus inward to retrieval capabilities that run directly inside a single DocumentDB cluster. Modern applications increasingly need two distinct search patterns over the same operational data they already store: keyword-based retrieval, which finds documents containing specific terms, and meaning-based retrieval, which finds documents whose content is semantically closest to a user's intent. Historically, teams exported data from DocumentDB to a dedicated search engine such as Amazon OpenSearch Service to satisfy these patterns, adding ETL pipeline complexity, synchronization lag, and operational overhead. DocumentDB now offers native text search through the $search aggregation stage and native vector search through the $vectorSearch aggregation stage, allowing developers to query documents by keyword relevance or semantic similarity without leaving the database.
Before diving into mechanics, here are the key terms this lesson covers. Text indexes tokenize and stem string field values so the $search stage can match documents by keyword relevance. Vector indexes organize numerical embedding arrays so the $vectorSearch stage can retrieve documents by semantic similarity. The two supported vector index algorithms are
Attention: For exam scenarios, DocumentDB search is the correct answer when the requirement is integrated retrieval inside the operational document store. OpenSearch remains preferred when the scenario calls for broad search analytics, log search, or a dedicated search platform with dashboards and aggregations.
Understanding where each service fits is a recurring exam theme, so keep this boundary in mind as you work through the sections below.
Native text search in DocumentDB
Full-text search in DocumentDB begins with creating a text index on one or more string fields in a ...