Database Model Fundamentals
Explore the fundamentals of eight major AWS database models including relational, document, key-value, and graph databases. Understand how to match AWS managed services like Aurora, DynamoDB, and Neptune to specific workload patterns for optimal performance. Learn the principles of polyglot persistence and distributed system qualities such as consistency, durability, and availability to design production-ready architectures.
Modern production systems rarely run on a single database. A typical e-commerce platform stores orders in a relational database, product catalogs in a document store, user sessions in a key-value cache, and operational telemetry in a time-series database. Each subsystem has a fundamentally different access pattern, consistency requirement, and scale profile, and forcing all of them into one database engine creates compromises in latency, cost, or resilience. This architectural approach, where each subsystem gets the database model that fits its workload shape, is called polyglot persistence.
This lesson covers eight major database and analytics patterns: relational, document, key-value, wide-column, in-memory, graph, time-series, and data warehousing. It maps each pattern to the AWS managed services that support it, including Aurora and RDS for relational workloads, DocumentDB for document workloads, DynamoDB for key-value access, Keyspaces for wide-column workloads, ElastiCache and MemoryDB for in-memory use cases, Neptune for graph workloads, Timestream for time-series data, and Redshift for data warehousing.
Eight database models explained
Each database model exists because a specific data shape and access pattern demands it. Understanding the internal mechanics of each model is what separates a correct exam answer from a plausible-sounding distractor.
Relational and document models
Relational databases enforce a fixed schema of tables, rows, and foreign keys. They support ACID transactions and complex joins, making them the default for
Amazon Aurora is compatible with MySQL and PostgreSQL. Aurora supports up to 256 TB of storage and up to 15 read replicas. Aurora replicates data as six copies across three Availability Zones, which is how it achieves both durability and fast failover.
Document databases store schema-flexible JSON or BSON documents that can be nested to any depth. Each record can have a different shape, which makes this model ideal for product catalogs, content management systems, and user profiles where the data structure evolves frequently. DocumentDB provides MongoDB compatibility with automatic replication across three AZs, while DynamoDB also supports document access patterns through its native JSON handling.
Attention: DocumentDB is document-compatible, but it is not a universal answer for every NoSQL solution. Graph, time-series, and pure key-value workloads each have purpose-built services that outperform a document store for those patterns.
Key-value, wide-column, and in-memory models
Key-value databases represent the simplest model, functioning as a distributed hash map at scale. Access is strictly by primary key, delivering single-digit-millisecond latency. DynamoDB is the flagship AWS service here, and partition key design is the critical optimization lever that determines whether throughput is distributed evenly or creates hot partitions.
Wide-column databases extend the key-value concept by allowing rows to contain dynamic columns grouped into column families. This structure suits IoT telemetry and large-scale event logging, where each row may carry a different set of attributes. Amazon Keyspaces provides Cassandra compatibility as a fully managed service.
In-memory databases keep data entirely in RAM for sub-millisecond reads. ElastiCache supports Redis and Memcached for caching layers, leaderboards, and session stores. MemoryDB for Redis adds durability through a transaction log when the workload needs both speed and persistence. ElastiCache is primarily a cache, not a durable primary database. Although Redis replicas and persistence features improve recovery, MemoryDB is generally recommended for durable, primary, in-memory storage.
Graph, time series, and data warehouse models
Graph databases model data as nodes and edges with properties, optimized for traversing relationships. Social networks, fraud-detection rings, and recommendation engines all rely on multi-hop traversals that would require expensive recursive joins in a relational system. Neptune supports both Gremlin and SPARQL query languages and scales reads through up to 15 read replicas.
Time series databases are optimized for append-heavy, timestamped data with built-in aggregation and automatic retention policies. DevOps metrics, IoT sensor streams, and application telemetry all fit this model. Timestream separates storage into a memory tier for recent high-frequency queries and a magnetic tier for historical data, enabling cost optimization without manual data life cycle management.
Data warehouse databases are optimized for analytical processing across massive datasets rather than high-frequency transactional writes. In practice, a data warehouse is typically a relational database optimized for OLAP through columnar storage and massively parallel processing. Business intelligence dashboards, financial reporting, and petabyte-scale analytical queries typically require columnar storage, massively parallel processing, and optimized aggregation performance. Amazon Redshift uses columnar storage and distributed query execution to process large-scale analytical workloads efficiently, while features such as Redshift Spectrum allow querying data directly in Amazon S3 without fully loading it into the warehouse.
The following table consolidates the distinguishing characteristics of all eight models into a single reference:
Database Model Comparison: Schema, Query, Scaling, and AWS Service Overview
Database Model | Schema Flexibility | Primary Query Pattern | Scaling Style | Typical Latency | Consistency Model | Best Workload Fit | AWS Service |
Relational | Fixed schema | SQL joins and aggregations | Vertical + read replicas | Low ms | Strong (ACID) | OLTP / complex joins | Aurora, RDS |
Document | Flexible (JSON) | Query by any field | Horizontal | Low ms | Tunable | CMS, catalogs | DocumentDB, DynamoDB |
Key-Value | Schema-free | Lookup by key | Horizontal | Single-digit ms | Eventual or strong | Sessions, shopping carts | DynamoDB |
Wide-Column | Flexible columns | Row key + column family | Horizontal | Low ms | Tunable | IoT data, event logs | Keyspaces |
In-Memory | Varies | Key-based access | Horizontal | Sub-ms | Eventual | Caching, leaderboards | ElastiCache, MemoryDB |
Graph | Nodes and edges | Graph traversals | Vertical + read replicas | Low ms | Strong | Relationships, fraud detection | Neptune |
Time Series | Timestamped data | Time-range queries | Horizontal | Low ms | Eventual | Metrics, telemetry | ATimestream |
Data Warehouse (Columnar Relational) | Structured schema with predefined tables and columns | Large-scale analytical queries, aggregations, OLAP, reporting, and BI workloads | Horizontal scaling through distributed nodes and massively parallel processing (MPP) | Seconds to minutes for complex analytical queries | Strong consistency for committed data | Business intelligence, reporting, analytics, ETL, and petabyte-scale data warehousing | Redshift |
With the eight models clearly defined, the next step is mapping real workload patterns to the models that serve them best.
Mapping workloads to database models
Knowing the models is necessary but not sufficient. Production decisions require matching a business workloads shape to the model that handles it without forcing compromises.
OLTP workloads involve high-frequency reads and writes with ACID guarantees. Relational databases such as Aurora and RDS are the default. DynamoDB fits when the access pattern is strictly key-based, and joins are unnecessary, trading relational flexibility for horizontal scale.
Analytics workloads run complex aggregations over large datasets. Redshift serves data warehousing, and OpenSearch, a specialized search engine, handles full-text search and log analytics. Neither is an OLTP engine, and confusing them as such is a common exam distractor.
Caching workloads reduce primary database load, lowering both latency and cost. ElastiCache facilitates this via the cache-aside pattern (managed by TTL and eviction policies), where the application follows a strict sequence: it checks the cache first, retrieves data from the primary database on a cache miss, and immediately writes that data back to the cache for future use.a caching strategy where the application checks the cache first, and on a miss, reads from the primary database and writes the result back into the cache for subsequent requests. Event-driven workloads react to state changes with eventual-consistency tolerance. DynamoDB Streams can feed Lambda functions for downstream processing, or Kinesis can route events into Timestream for time-aware storage.
Streaming workloads involve continuous high-throughput ingestion of time-ordered data. Kinesis Data Streams handles ingestion, while Timestream provides storage and time-range query capabilities.
Practical tip: The biggest mistake is selecting the database by familiarity instead of workload shape. Using Redshift as an OLTP database or treating ElastiCache as durable primary storage are the kinds of mistakes that cost you dearly.
Distributed-systems qualities for selection
When selecting a database service, we must evaluate how the system behaves during failures, replication delays, and scaling events. AWS databases make different trade-offs between consistency, durability, and availability depending on the workload requirements.
The following characteristics heavily influence architectural decisions in distributed systems:
Consistency, durability, and availability
These three attributes are frequently conflated on exams, but they describe different guarantees.
Consistency determines whether a read immediately reflects the most recent write. Aurora provides strong consistency on the writer endpoint, while reads from replicas may lag slightly. DynamoDB is eventually consistent by default, but it supports both eventual and strongly consistent reads on a per-request basis, giving the application control. ElastiCache is eventually consistent by nature because cache invalidation is asynchronous.
Durability measures whether committed data survives failures. Aurora replicates six copies across three AZs and can tolerate the loss of an entire AZ without data loss. DynamoDB replicates across three AZs automatically. ElastiCache data can be lost entirely on node failure, which is why MemoryDB exists for workloads that need in-memory speed with a durable transaction log.
Availability describes whether the service continues accepting requests during failures. Multi-AZ deployments provide automatic failover for Aurora and RDS. DynamoDB is inherently multi-AZ with no single point of failure. Global Tables and Aurora Global Database extend availability across Regions for workloads that cannot tolerate a regional outage.
Note: Durability and availability are not the same as low latency or strong consistency. A database can be highly durable and available while still serving eventually consistent reads at moderate latency. Exam questions often test whether you can distinguish these attributes.
RPO, RTO, and multi-region design
Recovery objectives translate business tolerance for downtime and data loss into concrete infrastructure decisions.
Aurora failover completes in under 30 seconds with multi-AZ replicas, achieving near-zero
For multi-region architectures, Aurora Global Database replicates to up to five secondary Regions with under one second of replication lag, but only the primary Region accepts writes. DynamoDB Global Tables support active-active multi-region writes, making them the correct choice when the workload requires write availability in every Region simultaneously.
The following decision tree synthesizes model selection and quality attributes into a single visual flow that mirrors how an architect evaluates a new workload:
With both model characteristics and quality attributes covered, the final step is applying them together in a realistic polyglot architecture.
Designing polyglot persistence
Consider the database layer for a ride-sharing platform. Each subsystem has a different access pattern, and choosing a single database for all of them would force unacceptable trade-offs.
User authentication and profiles use DynamoDB for key-value lookups by user ID, delivering single-digit-millisecond latency with horizontal scaling that handles millions of concurrent users.
Ride matching and routing use Neptune for graph traversals that find the nearest available drivers through relationship-based queries, a pattern that would require expensive recursive joins in a relational system.
Trip history and billing use Aurora PostgreSQL for ACID transactions on financial records, with complex joins supporting invoicing, dispute resolution, and regulatory reporting.
Real-time driver location uses ElastiCache with Redis geospatial commands for sub-millisecond reads on ephemeral data that is constantly refreshed as drivers move.
Vehicle telemetry uses Timestream for millions of timestamped sensor readings per second, with automatic data aging from the memory tier to the magnetic tier as readings become historical.
Search and discovery uses OpenSearch, a specialized search engine, for full-text search across ride history, driver reviews, and location-based queries that do not fit a transactional model.
This is not over-engineering. Each subsystem operates at a different point on the consistency, latency, and durability spectrum. Forcing all six into Aurora would degrade caching performance, complicate telemetry ingestion, and make graph traversals prohibitively slow.
Practical tip: When a scenario describes multiple subsystems with different access patterns, the answer almost always involves more than one database service. Look for the workload shape of each subsystem independently before selecting.
These scenarios reinforce that quality attributes are the deciding signals that refine an initial model choice into a production-ready architecture.
Conclusion
Every database model exists because a specific access pattern demands it. Relational databases handle ACID joins. DynamoDB handles massive-scale key-value access. ElastiCache accelerates reads with sub-millisecond caching. Neptune traverses relationships. Timestream manages time-series telemetry. Analytics engines like Redshift and OpenSearch, a specialized search engine, serve aggregation and search rather than transactional workloads. Modern cloud architectures embrace polyglot persistence to serve each subsystem optimally instead of forcing compromises.