Durability, HA, and DR

Explore how Amazon MemoryDB provides durable primary database capabilities by leveraging a multi-AZ transaction log, automatic replica promotion, and snapshot backups. Understand how these features ensure data reliability and availability during node failures, logical errors, and regional outages. This lesson helps you distinguish between availability zones and region-level resilience and equips you with strategies to design robust, fault-tolerant MemoryDB deployments.

We'll cover the following...

The MemoryDB durability model
- How the write path works
Multi-AZ and replica promotion
- The failover sequence
  - What the application experiences
  - Why replica placement matters
Backup, restore, and recovery planning
- How snapshots work
- The restore-to-new-cluster model
Multi-region awareness and DR design
- Cross-region replication patterns
- DNS failover orchestration
Bringing resilience together

The previous lesson focused on performance and scale, covering shard sizing, replica count, throughput tuning, and latency optimization. Those techniques ensure Amazon MemoryDB for Redis delivers sub-millisecond reads and high write throughput. But speed means nothing if data disappears after a failure. A cache that loses its contents during a node crash forces the application to rebuild state from a slower backend, and for workloads that treat the in-memory layer as the primary database, that loss is unacceptable. This lesson shifts the conversation from how fast MemoryDB operates to how reliably it preserves data and recovers from failures.

Traditional in-memory stores like ElastiCache for Redis treat data as ephemeral. If a node fails, the data in memory is gone, and the application must repopulate the cache from its source of truth. MemoryDB breaks that assumption by functioning as a durable primary database that happens to serve data from memory. Understanding how it achieves this requires examining four resilience pillars that build on each other.

Durability model: The transaction log mechanism that persists every write before acknowledging it to the client.
High availability via Multi-AZ: Automatic replica promotion that keeps the cluster operational when a primary node fails.
Backup and restore through snapshots: Cluster-level point-in-time captures that protect against logical errors and support operational recovery.
Multi-region awareness: Cross-region replication patterns that extend disaster recovery beyond a single region.

Along the way, you will encounter key terms such as RPO (recovery point objective)The maximum acceptable amount of data loss measured in time, representing how far back in time a recovery point can be from the moment of failure. and RTO (recovery time objective)The maximum acceptable duration of downtime before a service must be restored to operational status.. By the end of this lesson, you will be able to distinguish AZ-level resilience from region-level resilience and select the right mechanism for each failure scenario.

The MemoryDB durability model

MemoryDB is not just another Redis cache with persistence bolted on. Every mutating operation, whether a simple SET command or a complex sorted-set update, is written to a ...

1.Introduction

2.Common Foundation for All AWS Database Study

Cloud Lab

3.Amazon RDS

Cloud Lab

Cloud Lab

4.Amazon Aurora

Cloud Lab

5.Amazon DocumentDB

Cloud Lab

Cloud Lab

6.Amazon DynamoDB

Cloud Lab

Cloud Lab

7.Amazon ElastiCache

Cloud Lab

8.Amazon KeySpaces

Cloud Lab

9.Amazon MemoryDB

Cloud Lab

10.Amazon Neptune

Cloud Lab

11.Amazon Timestream

Cloud Lab

12.Conclusion

Durability, HA, and DR

The MemoryDB durability model