Search⌘ K
AI Features

Network Observability and Troubleshooting

Discover how to effectively observe and troubleshoot network connectivity in complex AWS environments. This lesson covers using VPC Flow Logs, inspecting Transit Gateway route tables, diagnosing DNS failures, and applying a structured approach to resolving hybrid connectivity problems without adding redundant connections.

In large AWS environments, network issues can become difficult to diagnose quickly. A single organization may operate dozens of accounts, hundreds of VPCs, and multiple hybrid connections to on-premises data centers. When a newly attached VPC cannot reach internal servers or DNS queries fail across accounts or connected networks, the solution is rarely to add another VPN tunnel, peering link, or connection.

A better approach is to observe and validate what is already in place. Architects need to inspect route tables, review traffic metadata, verify security boundaries, and trace DNS resolution paths to identify where communication is breaking down.

This lesson introduces a practical troubleshooting framework for AWS network observability. You’ll learn how to use VPC Flow Logs to capture traffic metadata, choose the right log destination for analysis, troubleshoot hybrid connectivity through Transit Gateway and Direct Connect, and diagnose DNS failures in multi-account and hybrid architectures.

The following diagram illustrates how these observability components fit together in a centralized, multi-account architecture.

This architecture establishes the foundation for every troubleshooting workflow discussed in this lesson. Understanding each component’s role begins with the primary data source: VPC Flow Logs.

Capturing traffic with VPC Flow Logs

VPC Flow Logs capture IP traffic metadata at the VPC, subnet, or elastic network interface (ENI) level. Each flow log records useful details such as source and destination IP addresses, ports, protocol number, packet and byte counts, the action taken (ACCEPT or REJECT), and log status. A key distinction is that Flow Logs capture metadata only, not packet payloads. For packet-level inspection, Traffic Mirroring is the appropriate tool.

To control log volume and cost, VPC Flow Logs also support custom log formats, allowing architects to select only the metadata fields needed for a specific analysis use case.

Destinations and custom formats

Flow Logs support three destinations, each suited to a different operational pattern. CloudWatch Logs enables near-real-time metric filters and alarms, making it ideal for detecting rejected traffic spikes within seconds. Amazon S3 provides cost-effective batch delivery for long-term retention, where Amazon Athena runs ad hoc SQL queries against partitioned log data. Kinesis Data Firehose streams records into third-party SIEM tools such as Splunk or Amazon OpenSearch for continuous security analytics.

The following table compares these destinations across dimensions:

VPC Flow Log Destinations: Choosing the Right Analysis Path

Destination

Latency to Insight

Cost Profile

Best Use Case

Query/Analysis Tool

Cross-Account Support

CloudWatch Logs

Near-real-time (seconds)

Higher at scale

Real-time alerting and metric filters

CloudWatch Insights / Metric Filters

Yes, via destination policies

Amazon S3

Minutes (batch delivery)

Lowest for large volumes

Long-term retention and compliance

Amazon Athena / QuickSight

Yes, via bucket policies in centralized logging account

Kinesis Data Firehose

Near-real-time (streaming)

Medium (throughput-based)

Streaming security analytics and SIEM integration

Custom consumers / Splunk / OpenSearch

Yes, via cross-account Kinesis streams

Candidates must also understand what Flow Logs do not capture. Flow Logs do not record DNS query details resolved by the VPC-provided ...