How Cloudflare’s edge network handles over 60 million RPS

How Cloudflare’s edge network handles over 60 million RPS

This newsletter explores how Cloudflare’s architecture handles over 60M RPS through Anycast routing for resilience, tiered caching for performance, and Cloudflare Workers for serverless compute at the edge.
12 mins read
Nov 19, 2025
Share

Building a network capable of handling peak traffic of over 60 millionhttps://blog.cloudflare.com/analysis-of-the-epyc-145-performance-gain-in-cloudflare-gen-12-servers/ legitimate requests per second, while also defending against record-breaking attacks as large as 71 millionhttps://blog.cloudflare.com/cloudflare-mitigates-record-breaking-71-million-request-per-second-ddos-attack/ requests per second, requires distributed intelligence and real-time defense. As a company that operates one of the world’s largest networks for content delivery, security, and edge computing, Cloudflare relies on a distributed-first modelAn approach that decentralizes compute, security, and control across global edge nodes rather than relying on a central core, enabling faster response, higher resilience, and local autonomy. in which every edge node can cache, compute, and defend traffic autonomously.

This newsletter examines the architectural design, software stack, and operational principles that enable this global infrastructure to be possible. It also identifies lessons that engineers and system designers can apply to their own large-scale systems. Here's what else we'll cover:

  • The logic behind the global edge server network

  • How Anycast routing provides speed and resilience

  • Strategies for absorbing massive DDoS attacks

  • Principles for building and scaling large-scale systems

Let's get started.

Cloudflare’s network#

Cloudflare’s architecture follows the principle that each data center is capable of running every core service on its servers, enabling uniform functionality across the network. This model extends beyond content delivery to form a unified platform where security, performance, and compute operate at the edge, close to end users. With a network now handling over 60 million requests per second, this design has demonstrated its ability to scale under sustained global demand.

Global data center hubs, grouped by region and linked by a private backbone
Global data center hubs, grouped by region and linked by a private backbone

Key insight: Every edge server runs the full software stack, including caching, security, and compute, which ensures identical functionality across all regions.

For system designers, this model illustrates how distributed architectures minimize latency, improve resilience, and filter malicious traffic before it reaches origin servers. Moving compute and security away from centralized cores ensures that a server in Tokyo handles a request from a user in Tokyo, rather than one in Virginia.

This distributed-first approach delivers consistent, low-latency performance worldwide and creates a broad surface area to absorb and mitigate large-scale attacks. The next section examines the network’s physical topology.

How the global edge server network operates#

Cloudflare operates a globally interconnected network spanning over 330 cities across more than 125 countrieshttps://www.cloudflare.com/network/. This extensive physical presence forms the foundation of its performance and resilience. The core operational principle is proximity, which directly influences latency and reliability. Placing data centers close to users and peering with thousands of network providers reduces the distance and number of network hops each request must traverse.

Peering refers to the process by which independent internet networks exchange traffic directly without relying on third-party carriers. This approach reduces latency, improves reliability, and lowers cost.

Cloudflare’s extensive network topology enables geographic load balancing. The Anycast network routes requests to the nearest available data center. If one becomes unavailable, Border Gateway Protocol (BGP)https://www.cloudflare.com/learning/security/glossary/what-is-bgp/ automatically redirects traffic to the next optimal site, ensuring seamless failover. For example, a request originating in Southeast Asia might be rerouted from Singapore to Hong Kong or Tokyo. This failover is continuous and automated, a built-in feature of the system’s architecture rather than a temporary recovery mechanism.

This design ensures that even with localized outages, the service remains available and performant worldwide. The diagram below illustrates how Cloudflare maintains continuity through automated routing and failover.

Automated routing and failover
Automated routing and failover

A foundational networking protocol enables this routing intelligence and forms the basis of Cloudflare’s global operation.

Fundamentals of Anycast routing and traffic flow#

Anycasthttps://www.cloudflare.com/learning/cdn/glossary/anycast-network/ is a network routing technique in which the same IP address is advertised from multiple locations. Cloudflare uses this model to route each user to the data center that is topologically closest to them. While many services rely on DNS-based load balancing, Cloudflare’s HTTP and HTTPS traffic primarily use Anycast routing via BGP for faster, network-level redirection.

In an Anycast network, identical IP addresses are announced from every data center. Internet routers automatically direct a user’s packets to the nearest location. This eliminates the need for client-side logic to locate the optimal server. When a data center becomes unavailable, its IP announcements are withdrawn from the global routing table, and traffic naturally shifts to the next nearest location. This provides an always-on failover capability that is both simple and robust.

Resilience insight: When an Anycast node withdraws its IP announcement, traffic automatically reroutes via BGP to the next available data center. This provides built-in, network-level failover without DNS updates or client-side logic.

Anycast improves both performance and security. Users are served from nearby locations, minimizing latency, while DDoS traffic is distributed across the network, reducing its impact at any single point.

The diagram below compares Anycast with traditional routing models.

A comparative overview of Unicast, DNS-based, and Anycast network routing methods
A comparative overview of Unicast, DNS-based, and Anycast network routing methods

The security advantages of this routing model become most evident when examining how Cloudflare mitigates large-scale attacks.

Absorbing and mitigating DDoS attacks#

Cloudflare’s ability to absorb large-scale distributed denial-of-service (DDoS) attacks stems from its Anycast-based architecture, rather than relying solely on bandwidth. When an attack begins, malicious traffic is ingested at the edge data center closest to its source. Instead of overwhelming a single target, the attack load is spread across Cloudflare’s global footprint.

A notable example is the 71 million request-per-second DDoS attack that Cloudflare successfully mitigated. The attack originated from a botnet of more than 30,000 IP addresses and exploited the HTTP/2 Rapid Reset vulnerability. Due to the Anycast routing model, traffic was distributed across hundreds of data centers. Each site handled only a fraction of the total volume, allowing automated defense systems to identify and block malicious patterns without disrupting legitimate traffic. These defenses are implemented on every server and can apply rate limits, filtering rules, and traffic scrubbing in real time.

Distributed defense: Anycast routing enables the absorption of a large-scale attack. By spreading malicious traffic across hundreds of edge locations, Cloudflare ensures that no single site becomes overloaded, allowing local defenses to respond in real time.

The defense process operates across multiple layers.

  • Detection: Automated systems monitor traffic patterns for anomalies that indicate an attack.

  • Distribution: The Anycast network spreads the attack traffic across the global footprint, preventing overload at any single point.

  • Mitigation: Each edge acts as an independent mitigation point, filtering and scrubbing traffic based on signatures, behavior, and heuristics to ensure only legitimate requests reach the origin server.

Cloudflare’s experience across diverse attack types, including protocol exploits, botnet-driven floods, and amplification attacks, demonstrates the strength of distributed defense. Each incident reinforces a core design principle: global scale and uniform edge capabilities enable the network to automatically absorb and neutralize attacks, often without users even noticing.

Security is a core function of the edge, while performance depends on intelligent caching.

Caching and content delivery at the edge#

Cloudflare’s distributed network functions as a large cache that stores both static and dynamic content at its edge locations, reducing latency and protecting origin servers from excessive load. When a user requests a resource, it can often be served directly from a nearby data center, eliminating the need for additional requests to the origin server over long network paths.

The architecture employs a multi-layered caching strategy. Tiered Cachehttps://developers.cloudflare.com/cache/how-to/tiered-cache/ technology organizes Cloudflare’s data centers into a hierarchy. Suppose a local edge cache does not contain the requested content. In that case, it first requests the content from a larger, regional Tier 2 data center, rather than accessing the content directly from the origin. This design increases the cache hit ratio because regional data centers aggregate requests from many smaller ones. As a result, fewer requests reach the origin, conserving bandwidth and compute resources.

Design takeaway: A tiered caching hierarchy increases cache hit ratios and shields origin servers. Regional aggregation reduces redundant traffic and maintains low latency across global deployments.

Cache key design, which determines how content is stored and retrieved, is equally critical. Granular control over caching rules allows developers to keep dynamic content fresh while caching static assets for longer periods. These cache tiers exchange data over Cloudflare’s private backbone network, ensuring consistent performance and availability worldwide.

The flowchart below illustrates how a high cache hit ratio directly improves performance.

Cache hits reduce backhaul traffic and origin load
Cache hits reduce backhaul traffic and origin load

The edge now extends beyond caching to execute custom code and application logic directly on the network.

Executing customer code with Cloudflare workers#

Cloudflare extends its edge capabilities beyond caching and security with Cloudflare workershttps://www.cloudflare.com/developer-platform/products/workers/, a serverless compute platform that allows developers to deploy and run code directly on Cloudflare’s global network. Instead of managing servers, developers can deploy lightweight functions that execute in response to HTTP requests at the edge location closest to the user.

The platform runs on V8 IsolatesA lightweight sandboxing technology developed for the Google Chrome V8 JavaScript engine. Isolates provide secure, separate execution environments for code with much lower startup time and memory overhead compared to traditional containers or virtual machines., a lightweight isolation mechanism within the same process that eliminates the need for per-request containers or virtual machines. Each isolate runs in its own memory context, ensuring strong isolation between workloads while sharing the same runtime process for efficiency. The minimal startup time, typically under five milliseconds, enables code execution on every request with negligible added latency.

Developers use Workers to perform tasks such as A/B testing, header modification, user authentication, request routing, and even building complete applications that operate entirely at the edge. Running code close to users minimizes latency and enables highly responsive, personalized experiences.

Worker execution flow in a V8 Isolates
Worker execution flow in a V8 Isolates

The specific software choices that enable this low-latency execution are a key part of the platform’s design.

The underlying software stack and its significance#

The performance and security of Cloudflare’s edge depend heavily on its underlying software stack. Much of Cloudflare’s modern edge software, including key performance-critical components, is written in Rust; however, the overall stack also incorporates other languages, such as Go and Lua. Rust is a systems programming language that guarantees memory safety without relying on a garbage collector. This approach eliminates many classes of memory-related bugs and vulnerabilities, such as buffer overflows, which are critical to prevent in network-facing systems. Its zero-cost abstractions provide performance comparable to C++ while maintaining stronger memory guarantees.

Cloudflare Workers run on Google’s V8 engine, the same runtime used by Chrome to execute JavaScript and WebAssembly efficiently. The use of V8 Isolateshttps://v8docs.nodesource.com/node-0.8/d5/dda/classv8_1_1_isolate.html enables Cloudflare to securely run untrusted code from thousands of different customers on the same machine, providing strong isolation and minimal performance overhead. This model enables multi-tenant execution within the same process while preserving isolation between customers.

The comparison below contrasts these modern technologies with more traditional approaches in System Design.

The following table contrasts Rust with C++:

Feature

Rust

C++

Memory Safety

Ownership model with compile time checks and no garbage collector

Manual memory management prone to safety issues

Performance

Comparable to C++ through zero-cost abstractions

High performance achieved through direct hardware manipulation and manual memory control

Concurrency

Safe concurrency enforced by ownership and type system

Manual concurrency management with potential thread safety risks

This next table contrasts the V8 Isolate model with traditional Containers/VMs:

Aspect

V8 Isolates

Containers/VMs

Startup time

Under 5 milliseconds

Over 500 milliseconds

Memory footprint

Around 10 MB

Over 100 MB

Context switching overhead

Low

High

These individual technology choices are guided by a broader set of architectural philosophies for building planet scale systems.

Key design principles for large-scale systems#

Cloudflare’s architecture demonstrates several key design principles that apply to any large-scale distributed system. These principles extend beyond technical choices and represent a philosophy for building resilient, scalable infrastructure. Applying them allows a system to grow without becoming brittle or unmanageable.

Principles of planet-scale design
Principles of planet-scale design

The following sections explain each principle in detail and how it contributes to Cloudflare’s global resilience.

  • Architectural decentralization: In Cloudflare’s model, there is no central control point. Each data center can independently handle routing, security, and load balancing, coordinated through a global control plane that distributes configuration and telemetry. This globally distributed architecture eliminates single points of failure and enables horizontal scalability. Computing, controlling, and decision-making occur directly at the edge.

  • Comprehensive observability: Operating a distributed system of this size requires deep visibility into its behavior. Cloudflare invests heavily in observability with extensive tracing, metrics, and logging from every server and request. This telemetry is essential for detecting anomalies, debugging issues, and monitoring real-time performance.

  • Fault isolation and graceful degradation: The system is built with the assumption that failures will occur. Isolating faults within individual data centers or servers prevents local issues from cascading globally. Services are designed to degrade gracefully, maintaining essential functionality even when dependencies fail.

  • Pervasive automation: Manual intervention is not feasible at this scale, so provisioning, deployment, monitoring, and attack mitigation are fully automated. This ensures consistency, reduces human error, and allows the system to respond rapidly to change.

Key insight: Resilience at a global scale does not emerge from any single mechanism. It results from the continuous interaction of decentralization, observability, isolation, and automation working together as one architectural system.

These principles are not unique to Cloudflare, but their disciplined application is what allows the platform to function at this scale. This conclusion summarizes the key takeaways from this architecture.

Wrapping up#

Cloudflare’s architecture provides a solid model for building global Internet services. Distributing compute to the edge, using Anycast for resilience, and building on a foundation of safe, high-performance software have created a platform that is highly scalable and robust. The core lessons in decentralization, observability, and automation are timeless principles for any system designer.

As the Internet evolves, new challenges will emerge, from AI-driven attack vectors to the growing demand for even lower latency edge computing. Cloudflare’s architectural choices position it well to adapt, providing a programmable and resilient network for future demands. For architects, the key takeaway is clear: Building for global scale requires thinking beyond centralized models and adopting a distributed first approach.

If you want to go deeper and master the skills needed to build planet scale, failure-tolerant systems, explore our expert led courses. Whether you’re designing distributed-first architectures, implementing advanced caching strategies, or engineering for global resilience, these paths offer practical frameworks to help you build highly available and performant services.


Written By:
Fahim ul Haq
Streaming intelligence enables instant, model-driven decisions
Learn how to build responsive AI systems by combining real-time data pipelines with low-latency model inference, ensuring instant decisions, consistent features, and reliable intelligence at scale.
13 mins read
Jan 21, 2026