Database Model Fundamentals

Explore the fundamentals of eight AWS database models including relational, document, key-value, wide-column, in-memory, graph, time-series, and data warehousing. Understand how each model fits specific workload patterns and how to design resilient, scalable architectures using AWS managed services like Aurora, DynamoDB, ElastiCache, and Redshift. This lesson helps you select appropriate database services based on workload shape, consistency, durability, and availability needs, enabling you to apply polyglot persistence in production cloud applications.

We'll cover the following...

Eight database models explained
Mapping workloads to database models
Distributed-systems qualities for selection
- Consistency, durability, and availability
  - RPO, RTO, and multi-region design
Designing polyglot persistence
Conclusion

Modern production systems rarely run on a single database. A typical e-commerce platform stores orders in a relational database, product catalogs in a document store, user sessions in a key-value cache, and operational telemetry in a time-series database. Each subsystem has a fundamentally different access pattern, consistency requirement, and scale profile, and forcing all of them into one database engine creates compromises in latency, cost, or resilience. This architectural approach, where each subsystem gets the database model that fits its workload shape, is called polyglot persistence.

This lesson covers eight major database and analytics patterns: relational, document, key-value, wide-column, in-memory, graph, time-series, and data warehousing. It maps each pattern to the AWS managed services that support it, including Aurora and RDS for relational workloads, DocumentDB for document workloads, DynamoDB for key-value access, Keyspaces for wide-column workloads, ElastiCache and MemoryDB for in-memory use cases, Neptune for graph workloads, Timestream for time-series data, and Redshift for data warehousing.

Eight database models explained

Each database model exists because a specific data shape and access pattern demands it. Understanding the internal mechanics of each model is what separates a correct exam answer from a plausible-sounding distractor.

Relational and document models

Relational databases enforce a fixed schema of tables, rows, and foreign keys. They support ACID transactions and complex joins, making them the default for OLTP (Online Transaction Processing).a workload pattern characterized by high-frequency, short-duration reads and writes that require strong transactional guarantees. Financial transactions, order management, and inventory systems all depend on this model. Amazon supports eight different database engines for managed relational database deployments.

Amazon Aurora is compatible with MySQL and PostgreSQL. Aurora supports up to 256 TB of storage and up to 15 read replicas. Aurora replicates data as six copies across three Availability Zones, which is how it achieves both durability and fast failover.

Document databases store schema-flexible JSON or BSON documents that can be nested to any depth. Each record can have a different shape, which makes this model ideal for product catalogs, content management systems, and user profiles where the data structure evolves frequently. DocumentDB provides MongoDB compatibility with automatic replication across three AZs, while DynamoDB also supports document access patterns through its native JSON handling.

Attention: DocumentDB is document-compatible, but it is not a universal answer for every NoSQL solution. Graph, time-series, and pure key-value workloads each have purpose-built services that outperform a document store for those patterns.

Key-value, wide-column, and in-memory models

Key-value databases represent the simplest model, functioning as a distributed hash map at scale. Access is strictly by primary key, delivering single-digit-millisecond latency. DynamoDB is the flagship AWS service here, and partition key design is the critical optimization lever that determines whether throughput is distributed evenly or creates hot partitions.
Wide-column databases extend the key-value concept by allowing rows to contain dynamic columns grouped into column families. This structure suits IoT telemetry and large-scale event logging, where each row may carry a different set of attributes. Amazon Keyspaces provides Cassandra compatibility as a fully managed service.
In-memory databases keep data entirely in RAM for sub-millisecond reads. ElastiCache supports Redis and Memcached for caching layers, leaderboards, and session stores. MemoryDB for Redis adds durability through a transaction log when the workload needs both speed and persistence. ElastiCache is primarily a cache, not a durable primary database. Although Redis replicas and persistence features improve recovery, MemoryDB is generally recommended for durable, primary, in-memory storage.

Graph, time series, and data warehouse models

Graph databases model data as nodes and edges with properties, optimized for traversing relationships. Social networks, fraud-detection rings, and recommendation engines all rely on multi-hop traversals that would require expensive recursive joins in a relational system. Neptune supports both Gremlin and SPARQL query languages and scales reads through up to 15 read replicas.

Time series databases are optimized for append-heavy, timestamped data with built-in aggregation and automatic retention policies. DevOps metrics, IoT sensor streams, and application telemetry all fit this model. Timestream separates storage into a memory tier for recent high-frequency queries and a magnetic tier for historical data, enabling cost optimization without manual data life cycle management.

Data warehouse databases are optimized for analytical processing across massive datasets rather than high-frequency transactional writes. In practice, a data warehouse is typically a relational database optimized for OLAP through columnar storage and massively parallel processing. Business intelligence dashboards, financial reporting, and petabyte-scale analytical queries typically require columnar storage, massively parallel processing, and optimized aggregation performance. Amazon Redshift uses columnar storage and distributed query execution to process large-scale analytical workloads efficiently, while features such as Redshift Spectrum allow querying data directly in Amazon S3 without fully loading it into the warehouse.

The following table consolidates the distinguishing characteristics of all eight models into a single reference:

Database Model Comparison: Schema, Query, Scaling, and AWS Service Overview

Database Model	Schema Flexibility	Primary Query Pattern	Scaling Style	Typical Latency	Consistency Model	Best Workload Fit	AWS Service
Relational	Fixed schema	SQL joins and aggregations	Vertical + read replicas	Low ms	Strong (ACID)	OLTP / complex joins	Aurora, RDS
Document	Flexible (JSON)	Query by any field	Horizontal	Low ms	Tunable	CMS, catalogs	DocumentDB, DynamoDB
Key-Value	Schema-free	Lookup by key	Horizontal	Single-digit ms	Eventual or strong	Sessions, shopping carts	DynamoDB
Wide-Column	Flexible columns	Row key + column family	Horizontal	Low ms	Tunable	IoT data, event logs	Keyspaces
In-Memory	Varies	Key-based access	Horizontal	Sub-ms	Eventual	Caching, leaderboards	ElastiCache, MemoryDB
Graph	Nodes and edges	Graph traversals	Vertical + read replicas	Low ms	Strong	Relationships, fraud detection	Neptune
Time Series	Timestamped data	Time-range queries	Horizontal	Low ms	Eventual	Metrics, telemetry	ATimestream
Data Warehouse (Columnar Relational)	Structured schema with predefined tables and columns	Large-scale analytical queries, aggregations, OLAP, reporting, and BI workloads	Horizontal scaling through distributed nodes and massively parallel processing (MPP)	Seconds to minutes for complex analytical queries	Strong consistency for committed data	Business intelligence, reporting, analytics, ETL, and petabyte-scale data warehousing	Redshift

With the eight models clearly defined, the next step is mapping real workload patterns to the models that serve them best.

Mapping workloads to database models

Knowing the models is necessary but not sufficient. Production decisions require matching a business workloads shape to the model that handles it without forcing compromises.

OLTP workloads involve high-frequency reads and writes with ACID guarantees. Relational databases such as Aurora and RDS are the default. DynamoDB fits when the access pattern is strictly key-based, and joins are unnecessary, trading relational flexibility for horizontal scale.
Analytics workloads run complex aggregations over large datasets. Redshift serves data warehousing, and OpenSearch, a specialized search engine, handles full-text search and log analytics. Neither is an OLTP engine, and confusing them as such is a common exam distractor.
a caching strategy where the application checks the cache first, and on a miss, reads from the primary database and writes the result back into the cache for subsequent requests.Caching workloads reduce primary database load, lowering both latency and cost. ElastiCache facilitates this via the cache-aside pattern (managed by TTL and eviction policies), where the application follows a strict sequence: it checks the cache first, retrieves data from the primary database on a cache miss, and immediately writes that data back to the cache for future use.
Event-driven workloads react to state changes with eventual-consistency tolerance. DynamoDB Streams can feed Lambda functions for downstream processing, or Kinesis can route events into Timestream for time-aware storage.
Streaming workloads involve continuous high-throughput ingestion of time-ordered data. Kinesis Data Streams handles ingestion, while Timestream provides storage and time-range query capabilities.

Practical tip: The biggest mistake is selecting the database by familiarity instead of workload shape. Using Redshift as an OLTP database or treating ElastiCache as durable primary storage are the kinds of mistakes that cost you dearly.

Distributed-systems qualities for selection

When selecting a database service, we must evaluate how the system behaves during failures, replication delays, and scaling events. AWS databases make different trade-offs between consistency, durability, and availability depending on the workload requirements.

The following characteristics heavily influence architectural decisions in distributed systems:

Consistency, durability, and availability

These three attributes are frequently conflated on exams, but they describe different guarantees.

Consistency determines whether a read immediately reflects the most recent write. Aurora provides strong consistency on the writer endpoint, while reads from replicas may lag slightly. DynamoDB is eventually consistent by default, but it supports both eventual and strongly consistent reads on a per-request basis, giving the application control. ElastiCache is eventually consistent by nature because cache invalidation is asynchronous.

Durability measures whether committed data survives failures. Aurora replicates six copies across three AZs and can tolerate the loss of an entire AZ without data loss. DynamoDB replicates across three AZs automatically. ElastiCache data can be lost entirely on node failure, which is why MemoryDB exists for workloads that need in-memory speed with a durable transaction log.

Availability describes whether the service continues accepting requests during failures. Multi-AZ deployments provide automatic failover for Aurora and RDS. DynamoDB is inherently multi-AZ with no single point of failure. Global Tables and Aurora Global Database extend availability across Regions for workloads that cannot tolerate a regional outage.

Note: Durability and availability are not the same as low latency or strong consistency. A database can be highly durable and available while still serving eventually consistent reads at moderate latency. Exam questions often test whether you can distinguish these attributes.

RPO, RTO, and multi-region design

Recovery objectives translate business tolerance for downtime and data loss into concrete infrastructure decisions.the maximum acceptable duration of downtime after a failure, which maps to failover speed and recovery automation.

Aurora failover completes in under 30 seconds with multi-AZ replicas, achieving near-zero RPOThe maximum acceptable amount of data loss measured in time, which maps directly to backup frequency and point-in-time recovery granularity. through synchronous replication and sub-minute RTOThe maximum acceptable duration of downtime after a failure, which maps to failover speed and recovery automation.. DynamoDB is always available by design, with Global Tables enabling active-active writes across multiple Regions. ElastiCache failover speed depends on whether a replica exists in another AZ. Without one, recovery means a cold start with an empty cache.

For multi-region architectures, Aurora Global Database replicates to up to five secondary Regions with under one second of replication lag, but only the primary Region accepts writes. DynamoDB Global Tables support active-active multi-region writes, making them the correct choice when the workload requires write availability in every Region simultaneously.

The following decision tree synthesizes model selection and quality attributes into a single visual flow that mirrors how an architect evaluates a new workload:

With both model characteristics and quality attributes covered, the final step is applying them together in a realistic polyglot architecture.

Designing polyglot persistence

Consider the database layer for a ride-sharing platform. Each subsystem has a different access pattern, and choosing a single database for all of them would force unacceptable trade-offs.

User authentication and profiles use DynamoDB for key-value lookups by user ID, delivering single-digit-millisecond latency with horizontal scaling that handles millions of concurrent users.
Ride matching and routing use Neptune for graph traversals that find the nearest available drivers through relationship-based queries, a pattern that would require expensive recursive joins in a relational system.
Trip history and billing use Aurora PostgreSQL for ACID transactions on financial records, with complex joins supporting invoicing, dispute resolution, and regulatory reporting.
Real-time driver location uses ElastiCache with Redis geospatial commands for sub-millisecond reads on ephemeral data that is constantly refreshed as drivers move.
Vehicle telemetry uses Timestream for millions of timestamped sensor readings per second, with automatic data aging from the memory tier to the magnetic tier as readings become historical.
Search and discovery uses OpenSearch, a specialized search engine, for full-text search across ride history, driver reviews, and location-based queries that do not fit a transactional model.

This is not over-engineering. Each subsystem operates at a different point on the consistency, latency, and durability spectrum. Forcing all six into Aurora would degrade caching performance, complicate telemetry ingestion, and make graph traversals prohibitively slow.

Practical tip: When a scenario describes multiple subsystems with different access patterns, the answer almost always involves more than one database service. Look for the workload shape of each subsystem independently before selecting.

These scenarios reinforce that quality attributes are the deciding signals that refine an initial model choice into a production-ready architecture.

Conclusion

Every database model exists because a specific access pattern demands it. Relational databases handle ACID joins. DynamoDB handles massive-scale key-value access. ElastiCache accelerates reads with sub-millisecond caching. Neptune traverses relationships. Timestream manages time-series telemetry. Analytics engines like Redshift and OpenSearch, a specialized search engine, serve aggregation and search rather than transactional workloads. Modern cloud architectures embrace polyglot persistence to serve each subsystem optimally instead of forcing compromises.

1.Introduction

2.Common Foundation for All AWS Database Study

Cloud Lab

3.Amazon RDS

Cloud Lab

Cloud Lab

4.Amazon Aurora

Cloud Lab

5.Amazon DocumentDB

Cloud Lab

Cloud Lab

6.Amazon DynamoDB

Cloud Lab

Cloud Lab

7.Amazon ElastiCache

Cloud Lab

8.Amazon KeySpaces

Cloud Lab

9.Amazon MemoryDB

Cloud Lab

10.Amazon Neptune

Cloud Lab

11.Amazon Timestream

Cloud Lab

12.Conclusion

Database Model Fundamentals

Eight database models explained

Relational and document models

Key-value, wide-column, and in-memory models

Graph, time series, and data warehouse models

Database Model Comparison: Schema, Query, Scaling, and AWS Service Overview

Mapping workloads to database models

Distributed-systems qualities for selection

Consistency, durability, and availability

RPO, RTO, and multi-region design

Designing polyglot persistence

Conclusion