5 tips for well-architected AWS networks
It usually starts with everything looking fine. Dashboards show all systems healthy, the deployment goes live, and the team confirms a clean rollout. Then traffic spikes, and response times start to climb. Logs show time-outs, pointing to a bottleneck in the network layer.
A well-designed network is the foundation of any successful cloud deployment.
The invisible yet critical foundation dictates performance, security, and scalability for every application we run. A poorly planned network can lead to security vulnerabilities, routing conflicts, and significant scaling challenges. This scenario is common in which initial shortcuts in network design create long-term operational friction and security risks.
Conversely, a well-architected network provides a resilient, secure, and flexible environment for growth.
In this newsletter, we’ll move beyond the basics and explore five practical, high-impact tips for designing a resilient and scalable network on AWS.
1. Connecting IPv6-only hosts to the IPv4 internet#
As IPv4 address space becomes more constrained, building services on IPv6-only subnets is an increasingly attractive strategy. But what happens when we want to maintain a pure IPv6 internal architecture and a service needs to communicate with a legacy API or software repository that only exists on the IPv4 internet?
AWS provides a seamless translation mechanism using a combination of a NAT Gateway and a VPC feature called DNS64. This setup allows our IPv6-only resources to initiate connections to IPv4 addresses without requiring any IPv4 addresses themselves.
The connectivity can be summarised in a few simple steps:
Enable DNS64: We first enable the DNS64 feature on your IPv6-only subnet within the VPC settings.
DNS resolution: When our EC2 instance attempts to resolve a domain like
example.com, the query goes to the VPC’s Route 53 Resolver.Address synthesis: If the resolver finds only an IPv4 address (an A record) for the domain, the DNS64 service synthesizes an IPv6 address (an AAAA record) by embedding the IPv4 address within a well-known IPv6 prefix (
64:ff9b::/96). It then returns this synthetic address to our instance.Egress routing: Our instance sends traffic to this synthetic IPv6 address. A rule in your subnet’s route table directs all traffic destined for the
64:ff9b::/96prefix to our NAT Gateway.NAT64 translation: The NAT Gateway performs the protocol translation, converting the IPv6 packet into an IPv4 packet and sending it to the destination on the public internet. Return traffic is translated from IPv4 to IPv6 and routed to our instance.
2. Choose the right VPC connectivity for the scale#
As the AWS footprint grows, connecting our VPCs efficiently and securely becomes important. A single connectivity method rarely fits all use cases. Choosing the right tool for the job is essential for scalability and manageability.
VPC subnet sharing: This is an excellent pattern for centralizing network management while delegating application space. A central networking account can own the VPC and its CIDR blocks, then share specific subnets with other AWS accounts. This allows application teams to launch resources like EC2 instances in a preconfigured, secure network segment without managing the underlying VPC infrastructure.
VPC peering: This is the simplest way to connect two VPCs. It creates a direct, private connection between them, allowing resources in either VPC to communicate as if they were on the same network. It’s best for simple, one-to-one relationships, but can become complex to manage at scale, leading to a mesh of connections.
Transit Gateway (TGW): The Transit Gateway is the industry best practice for scaling beyond a handful of VPCs. It acts as a regional network hub, allowing us to connect thousands of VPCs and our on-premises networks in a classic hub-and-spoke model. This simplifies routing and management by eliminating the need for complex peering meshes.
VPC Lattice: A newer, higher-level service, VPC Lattice simplifies service-to-service communication at the application layer. Instead of managing IP routes and security groups, we define a service network and grant access between services. It abstracts away the underlying network, making it a great choice for microservices architectures focusing on application connectivity, not network plumbing.
3. Traffic inspection patterns#
Controlling how traffic enters (ingress) and leaves (egress) our network is a cornerstone of cloud security and cost management. For each traffic flow, we can generally choose between these models:
Centralized egress: In this model, all outbound internet traffic from multiple VPCs is routed through a single, dedicated inspection VPC. This VPC contains security appliances like NAT Gateways and firewalls. The primary benefit is consistent security policy enforcement and monitoring from a single choke point.
Distributed egress: Here, each VPC has its own direct path to the internet, for example, via its own Internet Gateway or NAT Gateway. This pattern offers better scalability, avoids potential bottlenecks of a central VPC, and aligns with a decentralized, application-centric ownership model.
Centralized ingress: Like egress, this pattern routes all inbound internet traffic through a central inspection VPC before reaching our applications. This is ideal for applying consistent security checks, like a Web Application Firewall (WAF), to all incoming traffic.
Distributed ingress: This model provides a direct internet entry point for each application VPC, typically using a service like an Application Load Balancer. It simplifies routing and scales more easily, as each application team can manage its ingress path.
4. Design for resilience with DNS and zonal shifts#
A well-architected network must be resilient to failure. AWS provides powerful tools to automate failover and minimize downtime, moving beyond manual intervention.
Automate failover with Amazon Route 53: Route 53 is far more than a simple DNS service. By configuring health checks on our endpoints (like load balancers or EC2 instances), we can enable failover routing policies. If a health check fails in our primary region, Route 53 will automatically stop sending traffic to it and redirect users to our healthy secondary region, ensuring high availability.
Recover instantly with AWS Application Recovery Controller (ARC): For critical workloads, a zonal impairment can have a major impact. An ARC Zonal shift is a powerful, one-click recovery mechanism. If we detect an issue in a single Availability Zone (AZ), such as increased latency or error rates, we can call the zonal shift API to immediately redirect traffic away from the impaired AZ for our load balancer, allowing for rapid recovery while we investigate the root cause. For even greater resilience, we can enable Zonal Autoshift. With this feature, AWS automatically shifts our load balancer’s traffic away from an affected AZ on our behalf when it detects an issue, providing proactive, automated recovery without requiring manual intervention.
Pro tip: For ARC to function effectively, pre-provision the necessary additional capacity so that the load balancer can seamlessly redirect traffic to the standby resources when needed.
5. Solve hybrid DNS seamlessly#
One of the most common challenges in a hybrid cloud environment is DNS resolution. The on-premises servers need to find resources in AWS by name, and the AWS resources need to find on-premises servers. Managing host files or hard-coding IP addresses is not a scalable solution.
Amazon Route 53 Resolver is the DNS service for hybrid cloud environments. It resolves DNS names for resources inside the VPC. Depending on the direction of the traffic flow, there are two types of resolvers.
On-premises to AWS resolution: We configure a Route 53 Resolver inbound endpoint in our VPC. This creates an IP address that our on-premises DNS servers can forward requests to. When an on-premises server asks for
service.example.internal.aws, our DNS forwards the query to the inbound endpoint, which resolves it using the private hosted zone within our VPC.
A private hosted zone in Amazon Route 53 is a container that holds DNS records for a domain that’s only accessible within one or more connected VPCs. It allows resources inside those VPCs (like EC2 instances) to resolve custom internal domain names without exposing them to the public internet.
AWS to on-premises resolution: We configure a Route 53 Resolver outbound endpoint and create forwarding rules. For example, we can create a rule that says, any query for
*.corp.internalshould be forwarded to the IP addresses of my on-premises DNS servers. This allows instances inside our VPC to communicate with on-premises servers without knowing the IP address.
The result is a unified, seamless DNS system where any application, anywhere in our hybrid network, can find any resource by its domain name.
Wrapping up#
Every great architectural design begins with a solid foundation. In AWS networking, that foundation is our IP address management (IPAM) and Virtual Private Cloud (VPC) strategy. Getting this right from day one prevents complex and risky migrations down the road.
A few hours of thoughtful VPC planning up front can save us hundreds of hours of painful re-architecture work later. This solid foundation is the first step, but it must also be resilient to failure.
What’s next?#
Experience firsthand how thoughtful network planning shapes a resilient, high-performing AWS environment. Try out these Cloud Labs:
Understanding Networking Services in AWS—From Zero to Hero: In this Cloud Lab, you’ll become proficient in network services by creating a VPC, security groups, and load balancers.
Connecting Multiple VPCs Using Transit Gateway: In this Cloud Lab, you’ll learn how to connect multiple VPCs using AWS Transit Gateway, deploy a React app across VPCs, create instances, and efficiently route traffic.
Or try out these Cloud Challenges to put your knowledge to the test:
Create and Configure ALB to Build a Resilient Application: In this lab, you’ll build a fault-tolerant web app with ALB, Auto Scaling, HTTPS, and least-privilege networking in a Multi-AZ VPC.
Configuring Comprehensive Network Monitoring with VPC Flow Logs: In this lab, you’ll configure VPC and subnet-level traffic monitoring using VPC Flow Logs. This challenge-based exercise is designed for hands-on practice; step-by-step instructions will not be provided.