Disaster Recovery Strategies

Understand and evaluate the four canonical disaster recovery strategies in AWS. Learn how to design and select the right approach based on business RTO and RPO needs, balancing cost and operational complexity for robust multi-Region recovery solutions.

We'll cover the following...

Introduction to disaster recovery on AWS
Backup and Restore strategy
Pilot Light and Warm Standby designs
- Pilot Light architecture
  - Data tier replication
  - Failover sequence
- Warm Standby architecture
  - Stateful tier considerations
Multi-Site Active-Active architecture
- Traffic distribution and routing
- Cross-Region data consistency
Selecting the right DR strategy
Conclusion

Every enterprise architecture on AWS must answer a fundamental question: When an entire Region becomes unavailable, how quickly can the business resume operations, and how much data can it afford to lose? The AWS Certified Solutions Architect – Professional exam tests your ability to design for exactly this scenario. It distinguishes between high-availability patterns that protect against component failure within a Region and disaster recovery strategies that protect against Region-level disruption.

This lesson walks through the four canonical DR strategies, the AWS services that underpin each, and the decision framework that maps business requirements to the right architectural response.

Introduction to disaster recovery on AWS

Disaster recovery on AWS refers to the set of architectural patterns and operational procedures that restore workloads in a secondary Region after a primary Region experiences a prolonged or catastrophic failure. High availability, by contrast, uses multi-AZ deployments within a single Region to survive individual data center outages.

Two metrics govern every DR design decision: Recovery time objective (RTO)The maximum acceptable duration between the onset of a disruption and the restoration of service, measured in time units from seconds to hours and Recovery point objective (RPO)The maximum acceptable amount of data loss, measured backward in time from the moment of failure, representing the staleness of the last recoverable state. Every architectural choice in DR, from replication frequency to compute provisioning, traces directly to these two values.

The four canonical strategies form a spectrum. At the low-cost end, Backup and Restore tolerates hours of RTO and RPO. Pilot Light keeps core data replicated for rapid rebuild. Warm Standby maintains a scaled-down live environment. Multi-Site Active-Active delivers near-zero RTO and RPO at the highest cost. The AWS Well-Architected Reliability Pillar codifies these patterns as the authoritative framework.

Key AWS services recur across all strategies: Amazon S3 with Cross-Region Replication and versioning, AWS Backup with cross-Region and cross-account vaulting, Route 53 health checks and failover routing, Global Accelerator, Aurora Global Database, RDS cross-Region read replicas, Auto Scaling, and infrastructure-as-code tools such as CloudFormation and Terraform. The next lesson on AWS Elastic Disaster Recovery covers a managed service that automates several of these patterns; this lesson focuses on the strategies themselves and the decision logic behind them.

The following diagram illustrates the four strategies along the cost vs. recovery spectrum.