AWS Elastic Disaster Recovery

Explore how to implement AWS Elastic Disaster Recovery for enterprise environments requiring low RPO and RTO. Understand continuous block-level replication, failover strategies like pilot light and warm standby, non-disruptive testing, and governance best practices to design resilient AWS disaster recovery architectures.

We'll cover the following...

Continuous block-level replication
- Staging area architecture and connectivity
Automated failover and recovery design
- Pilot light vs. warm standby
  - Network architecture for the recovery landing zone
  - Failback planning
Non-disruptive testing and drills
Recovery optimization and governance
- Cost optimization strategies
- Multi-account governance and compliance
Conclusion

Enterprise organizations running workloads across on-premises data centers, colocation facilities, and multiple cloud providers face a core architectural challenge: delivering unified disaster recovery (DR) with low RPO and low RTO beyond what periodic backups or database-native replication can provide. Traditional approaches, such as VM exports, scheduled snapshots, or application-level replication, are insufficient when full-server recovery is required for physical machines, VMware environments, and cloud instances into a consistent AWS landing zone. AWS Elastic Disaster Recovery (DRS) addresses this gap as the AWS-preferred service for continuous block-level replication into AWS, enabling architects to meet strict recovery objectives while maintaining cost efficiency.

Within the broader AWS resilience ecosystem, DRS integrates with Route 53 for failover routing, Transit Gateway and VPCs for recovery networking, Direct Connect or Site-to-Site VPN for hybrid connectivity, and AWS Organizations with SCPs for multi-account governance. For SAP-C02 scenarios, DRS is the correct choice when requirements specify low RPO and low RTO for full-server recovery across heterogeneous environments, whereas AWS Backup, VM Import/Export, and snapshot-based solutions are data-protection tools rather than full disaster recovery mechanisms. This lesson covers continuous replication setup, automated failover design, nondisruptive testing, and scalable governance for recovery architectures.

The following diagram illustrates the end-to-end DRS architecture, from source servers through replication to recovery.

Continuous block-level replication

The replication mechanism in DRS begins with the AWS Replication Agent, a lightweight process installed on each source server that captures block-level disk changes in near real time and streams them to a staging areaA set of cost-optimized EC2 instances and EBS volumes in the target AWS Region that maintain a continuously updated replica of source server disks without running production-grade infrastructure. Unlike periodic snapshot or backup approaches that introduce hours of potential data loss, continuous block-level replication can achieve sub-second RPO targets because it captures and transmits disk writes incrementally.

Staging area architecture and connectivity

The staging area uses smaller EC2 instance types and gp3 EBS volumes to minimize cost while maintaining replication throughput. Plan hybrid connectivity carefully. The choice between Direct Connect and Site-to-Site VPN involves clear trade-offs.

Direct Connect provides dedicated, high-bandwidth, low-latency ...