Search⌘ K
AI Features

Disaster Recovery Strategies

Explore the four main disaster recovery strategies in AWS to design resilient systems. Understand how to select the right approach based on recovery time objectives, data loss tolerance, and cost. Learn the distinctions between high availability and disaster recovery, and how to apply AWS services for effective failover and continuity across multiple Regions.

Every enterprise architecture on AWS must answer a fundamental question: When an entire Region becomes unavailable, how quickly can the business resume operations, and how much data can it afford to lose? The AWS Certified Solutions Architect – Professional exam tests your ability to design for exactly this scenario. It distinguishes between high-availability patterns that protect against component failure within a Region and disaster recovery strategies that protect against Region-level disruption.

This lesson walks through the four canonical DR strategies, the AWS services that underpin each, and the decision framework that maps business requirements to the right architectural response.

Introduction to disaster recovery on AWS

Disaster recovery on AWS refers to the set of architectural patterns and operational procedures that restore workloads in a secondary Region after a primary Region experiences a prolonged or catastrophic failure. High availability, by contrast, uses multi-AZ deployments within a single Region to survive individual data center outages.

Two metrics govern every DR design decision: Recovery time objective (RTO)The maximum acceptable duration between the onset of a disruption and the restoration of service, measured in time units from seconds to hours and Recovery point objective (RPO)The maximum acceptable amount of data loss, measured backward in time from the moment of failure, representing the staleness of the last recoverable state. Every architectural choice in DR, from replication frequency to compute provisioning, traces directly to these two values.

The four canonical strategies form a spectrum. At the low-cost end, Backup and Restore tolerates hours of RTO and RPO. Pilot Light keeps core data replicated for rapid rebuild. Warm Standby maintains a scaled-down live environment. Multi-Site Active-Active delivers near-zero RTO and RPO at the highest cost. The AWS Well-Architected Reliability Pillar codifies these patterns as the authoritative framework.

Key AWS services recur across all strategies: Amazon S3 with Cross-Region Replication and versioning, AWS Backup with cross-Region and cross-account vaulting, Route 53 health checks and failover routing, Global Accelerator, Aurora Global Database, RDS cross-Region read replicas, Auto Scaling, and infrastructure-as-code tools such as CloudFormation and Terraform. The next lesson on AWS Elastic Disaster Recovery covers a managed service that automates several of these patterns; this lesson focuses on the strategies themselves and the decision logic behind them.

The following diagram illustrates the four strategies along the cost vs. recovery spectrum.

AWS disaster recovery strategies spectrum from backup and restore to multi-site active-active
AWS disaster recovery strategies spectrum from backup and restore to multi-site active-active

With this spectrum established, the next sections examine each strategy in architectural detail, beginning with the lowest-cost option. ...