Search⌘ K
AI Features

Non-Relational and Key-Value Stores

Non-relational data models, particularly Amazon DynamoDB, cater to high-velocity workloads by prioritizing horizontal scalability and low latency over rigid schema enforcement. Key design principles include selecting high-cardinality partition keys to avoid hot partitions, using composite sort keys for flexible querying, and implementing Global Secondary Indexes (GSIs) for alternative access patterns. DynamoDB offers two capacity modes—on-demand and provisioned—allowing users to balance cost and performance based on workload predictability. Additionally, Time to Live (TTL) functionality automates data expiration, enhancing data lifecycle management without incurring write costs.

In the previous lesson, relational patterns for Amazon RDS emphasized ACID compliance, referential integrity, and lock management, all of which are built on structured, normalized schemas. However, many high-velocity workloads such as IoT telemetry, session management, and real-time personalization demand a fundamentally different data model, one that trades rigid schema enforcement for horizontal scalability and single-digit-millisecond latency.

Amazon DynamoDB is an AWS-managed NoSQL key-value and document store, purpose-built for this class of workload. Unlike RDS, DynamoDB eliminates instance class selection, storage tier provisioning, and lock management by abstracting infrastructure entirely. This lesson covers three objectives critical for the AWS Certified Data Engineer – Associate exam:

  • Designing schemas driven by DynamoDB access patterns.

  • Selecting the right capacity mode for cost and performance.

  • Configuring TTL to expire stale data automatically.

Designing partitions and sort keys

Every DynamoDB table requires a partition key (PK) and optionally a sort key (SK), which together form the primary key. The partition key determines which physical partition an item is stored in by hashing its key value, while the sort key orders items within that partition, enabling range queries. The single most critical design decision you will make is choosing a high-cardinality partition key that distributes traffic evenly across partitions.

Consider two contrasting examples:

  • Using customerId or orderId as the PK yields thousands or millions of distinct values, spreading reads and writes across many partitions.

  • Using status (with values like ACTIVE, INACTIVE, PENDING) or date alone funnels disproportionate traffic into a handful of partitions, creating what is known as a ...