Non-Relational and Key-Value Stores

Non-relational data models, particularly Amazon DynamoDB, cater to high-velocity workloads by prioritizing horizontal scalability and low latency over rigid schema enforcement. Key design principles include selecting high-cardinality partition keys to avoid hot partitions, using composite sort keys for flexible querying, and implementing Global Secondary Indexes (GSIs) for alternative access patterns. DynamoDB offers two capacity modes—on-demand and provisioned—allowing users to balance cost and performance based on workload predictability. Additionally, Time to Live (TTL) functionality automates data expiration, enhancing data lifecycle management without incurring write costs.

We'll cover the following...

Designing partitions and sort keys
- Composite sort keys and per-partition limits
Global secondary indexes and NoSQL modeling
- Single-table design and access-pattern-first modeling
Capacity modes and cost optimization
- Monitoring throughput and detecting hot partitions
Managing data life cycle with TTL
Conclusion

In the previous lesson, relational patterns for Amazon RDS emphasized ACID compliance, referential integrity, and lock management, all of which are built on structured, normalized schemas. However, many high-velocity workloads such as IoT telemetry, session management, and real-time personalization demand a fundamentally different data model, one that trades rigid schema enforcement for horizontal scalability and single-digit-millisecond latency.

Amazon DynamoDB is an AWS-managed NoSQL key-value and document store, purpose-built for this class of workload. Unlike RDS, DynamoDB eliminates instance class selection, storage tier provisioning, and lock management by abstracting infrastructure entirely. This lesson covers three objectives critical for the AWS Certified Data Engineer – Associate exam:

Designing schemas driven by DynamoDB access patterns.
Selecting the right capacity mode for cost and performance.
Configuring TTL to expire stale data automatically.

Designing partitions and sort keys

Every DynamoDB table requires a partition key (PK) and optionally a sort key (SK), which together form the primary key. The partition key determines which physical partition an item is stored in by hashing its key value, while the sort key orders items within that partition, enabling range queries. The single most critical design decision you will make is choosing a high-cardinality partition key that distributes traffic evenly across partitions.

Consider two contrasting examples:

Using customerId or orderId as the PK yields thousands or millions of distinct values, spreading reads and writes across many partitions.
Using status (with values like ACTIVE, INACTIVE, PENDING) or date alone funnels disproportionate traffic into a handful of partitions, creating what is known as a ...

1.Introduction

2.Data Ingestion Architectures

Cloud Lab

3.AWS Data Stores

Cloud Lab

4.Data Cataloging and Lifecycle Management

5.Data Processing and Programming Logic

Cloud Lab

Cloud Lab

Cloud Lab

6.Pipeline Orchestration and Operations

Cloud Lab

Cloud Lab

Cloud Lab

7.Data Analysis and Quality Control

Cloud Lab

Cloud Lab

8.Pipeline Monitoring, Maintenance, and Auditing

Cloud Lab

Cloud Lab

9.Data Security and Governance

Assessment

10.Practice Exam Solution 1: AWS Certified Data Engineer – Associate

11.Free AWS Certified Data Engineer Associate Practice Exam

12.Conclusion

Non-Relational and Key-Value Stores

Designing partitions and sort keys