Search⌘ K
AI Features

Data Modeling and Query Design

Explore how to design effective data models and queries in Amazon DocumentDB by choosing between embedding and referencing, selecting appropriate indexes, using aggregation pipelines, and understanding MongoDB compatibility. This lesson helps optimize collections for read-heavy or write-heavy workloads while balancing performance and operational costs.

With the cluster infrastructure in place, compute separated from storage, automatic failover across Availability Zones, and storage that scales automatically, the next set of decisions shifts from infrastructure design to data design. In Amazon DocumentDB, document structure, indexing strategy, aggregation pipelines, and MongoDB compatibility awareness directly influence query performance, write efficiency, scalability, and operational cost.

This lesson explores four key design areas:

  • Embedding vs. referencing for modeling relationships between documents.

  • Index strategy using single-field, compound, multikey, and text indexes.

  • Aggregation pipelines for server-side data processing and analytics.

  • MongoDB compatibility awareness to ensure that features and operators are supported by the cluster version.

By understanding how these design choices interact, you can optimize collections for both read-heavy and write-heavy workloads while avoiding common performance pitfalls.

Embedding vs. referencing

One of the first schema design decisions in DocumentDB is determining whether related data should be stored within the same document (embedding) or separated into multiple documents linked through references (referencing).

Choosing between these approaches depends on access patterns, update frequency, and document growth characteristics rather than traditional relational normalization rules.

When to use embedding

Embedding is appropriate when:

  • Related data is frequently read together.

  • Arrays remain bounded and do not grow indefinitely.

  • Atomic single-document updates are required.

  • Read performance is more important than minimizing storage duplication.

Because all ...