ThousandEyes actively monitors the reachability and performance of thousands of services and networks across the global Internet, which we use to analyze outages and other incidents. The following analysis is based on our extensive monitoring, as well as ThousandEyes’ global outage detection service, Internet Insights.
Outage Analysis
Updated on July 17, 2025
On July 14, 2025, what began as widespread connectivity issues to web-based services quickly revealed a complex routing incident that highlighted how hidden vulnerabilities can be present in Internet infrastructure. Starting around 21:50 UTC, Cloudflare's public DNS resolver became unreachable for approximately one hour, leaving users who relied on this service unable to access websites and applications.
The incident created what appeared to be a BGP hijack for one of the affected prefixes—where another network seemingly took control of Cloudflare's IP addresses—but investigation revealed a more nuanced story: Cloudflare's route announcements had disappeared from the global routing table due to an internal configuration error, and what looked like hijacking was actually an effect of the route withdrawal that exposed dormant routing configurations when the legitimate routes vanished. This perfect storm combined missing routes, historical IP address usage, and potentially incomplete security enforcement, highlighting how legacy network configurations can create latent vulnerabilities, how business relationships might override cryptographic security protections, and why single points of failure remain problematic even in seemingly robust systems.
Read on for a deep dive into what happened and important lessons this outage leaves for enterprise network operations and infrastructure planning teams.
Explore the outage in the ThousandEyes platform (no login required).
The Initial Problem: Connectivity Failures Across the Internet
On July 14, 2025, at approximately 21:50 UTC, connectivity issues began affecting access to web-based services across the Internet, causing some users to report websites failing to open or loading errors. Since successful web connections require several sequential steps—DNS resolution, TCP handshake, and data transfer—the timing out behavior indicated a failure occurring early in this process, so we investigated DNS resolution first.
To isolate the problem, we validated that other public DNS resolvers appeared functional while connectivity issues were specifically affecting Cloudflare's public DNS resolvers at 1.1.1.1 and 1.0.0.1. This immediately narrowed the scope to Cloudflare's infrastructure.
Understanding Cloudflare's infrastructure architecture was important because they implement anycast technology, where their DNS service operates from hundreds of geographically distributed servers that all share the same IP addresses, with Internet routing automatically directing queries to the nearest available server. This anycast design inherently provides redundancy, meaning that individual servers or entire data centers can experience problems without users experiencing service disruption, as traffic automatically fails over to alternative locations.
Since anycast should have routed around any localized problems, the persistent failures suggested something more fundamental was wrong. ThousandEyes data allowed us to verify this hypothesis by showing paths to Cloudflare's prefixes with connectivity up to what appeared to be the last hop before Cloudflare's infrastructure, then consistent loss at that point. Measurements from geographically diverse vantage points showed this same pattern regardless of location, which ruled out localized server failures or regional connectivity issues that anycast should have handled. This pattern indicated the problem was occurring at a more fundamental routing level that would affect all attempts to reach these IP addresses globally.
A Pattern Emerges: BGP Routing Issues
This routing-level problem manifested as traffic destined for Cloudflare's prefixes being lost across multiple network paths and different ISP providers. The widespread nature of these failures across diverse network operators confirmed that the issue wasn't isolated to a single network provider but was occurring at the ISP edge level, where routing decisions are made.
When multiple independent networks experience the same connectivity problems simultaneously, it typically indicates an issue with how Internet routing information is being distributed—specifically with the Border Gateway Protocol (BGP) that governs how networks share routing information.
To understand what was happening at the routing layer, we analyzed BGP route announcements and withdrawals for Cloudflare's address space. This examination of the BGP routing data revealed that the incident simultaneously impacted two critical IP prefixes: 1.1.1.0/24 and 1.0.0.0/24, but manifested differently for each. The 1.0.0.0/24 range experienced a route withdrawal, while the 1.1.1.0/24 prefix experienced what appeared to be a BGP hijack, post route withdrawal.
The Technical Problem: Missing Routes
To understand why these prefixes were behaving differently, we examined what was actually happening with Cloudflare's route announcements. This deeper analysis revealed that Cloudflare's BGP route announcements for both IP prefixes were being withdrawn from the global routing table: 1.1.1.0/24 and 1.0.0.0/24.
BGP works like a system of road signs for Internet traffic. When a network wants to announce that it can handle traffic for specific IP addresses, it sends BGP announcements that essentially say "route traffic for these addresses through me." When these announcements disappear, other networks lose their roadmap for reaching those destinations.
Without active BGP routes to Cloudflare's prefixes, ISPs had no path to forward DNS queries. Traffic would reach the ISP's edge routers and simply stop—there was nowhere further to send it. The absence of routes, rather than any infrastructure failure, was preventing DNS queries from reaching their destination.
Into the Routing Layer: BGP Path Hunting
Starting around 21:51 UTC, we observed that network paths to Cloudflare's prefixes weren't staying consistent—a pattern that indicated BGP path hunting behavior. Instead of routes remaining stable and predictable, they were constantly changing as networks searched for viable paths to reach the destinations. This created the appearance of the affected prefixes bouncing between different networks—sometimes showing a path through a particular provider, then disappearing completely, then reappearing through a different network, creating a pattern of constant change as the routing system attempted to find working connections that didn't exist.
The path hunting behavior observed for both prefixes suggested they had become inactive, particularly given that both 1.1.1.0/24 and 1.0.0.0/24 were affected simultaneously with similar patterns.
An Unrelated BGP Announcement
The timing of this behavior, observed in conjunction with the preceding path hunting, showed that AS4755's announcements for 1.1.1.0/24 became active after Cloudflare's routes disappeared. Analysis of routing data showed that some ASNs that received AS4755's routes through their BGP sessions were using AS4755 as the origin for 1.1.1.0/24, filling a routing gap created by the absence of any active Cloudflare routes for that prefix.
This sequence of events naturally appeared as a hijacking incident because the emergence of AS4755 as the origin for 1.1.1.0/24 exhibited the classic characteristics of BGP prefix hijacking—a new autonomous system suddenly claiming ownership of a well-established prefix. From the perspective of network operators and monitoring systems, seeing AS4755 replace AS13335 as the origin for Cloudflare's flagship DNS service reflected the textbook definition of route hijacking, where an unauthorized network announces another organization's IP addresses. The timing correlation between Cloudflare's service disruption and AS4755's route announcements further reinforced this interpretation, as the symptoms aligned perfectly with route appropriation.
However, the observed pattern suggests that AS4755's announcements for 1.1.1.0/24 were already present in the BGP system but remained inactive due to Cloudflare's preferred routes through normal BGP path selection processes. This is evidenced by the fact that AS4755's routes appeared only in specific networks that already had connectivity to AS6453 (Tata Communications), and the timing showed these announcements becoming active immediately when Cloudflare's routes disappeared, rather than being newly originated.
Subsequent information from Cloudflare confirmed that this BGP announcement by AS4755 was not the cause of the outage, but rather an unrelated issue that became visible when Cloudflare's legitimate routes were withdrawn due to the configuration error.
The Historical Context: Why 1.1.1.1 Was Susceptible
This differential behavior between the two prefixes raises an important question: Why was 1.1.1.0/24 susceptible to alternative route activation when Cloudflare's routes disappeared? The answer lies in the complex history of the 1.1.1.1 address.
Before Cloudflare adopted this address range in 2018, 1.1.1.1 had been commonly used for testing and internal network configurations. This historical usage created numerous routing entries across different networks that could potentially activate when the legitimate route disappeared.
These dormant configurations represent a form of latent risk—routing information that remains inactive under normal conditions but can become problematic during outages. The 1.0.0.0/24 range didn't carry the same historical usage patterns.
Security Implications: The Challenges of RPKI Enforcement
Given that AS4755's route announcements for 1.1.1.0/24 were technically unauthorized and appeared to constitute a hijack, this raises an important question: What security protections were in place, and why didn't they prevent this situation?
Cloudflare had implemented Resource Public Key Infrastructure (RPKI) to cryptographically authorize which networks could legitimately announce their prefixes. However, AS4755's announcements continued to propagate through AS6453 (Tata Communications) despite being marked as RPKI invalid.
This propagation occurred despite RPKI validation that would have marked AS4755's announcements as invalid, since Cloudflare holds the legitimate Route Origin Authorization for this prefix. The fact that these routes continued to spread through AS6453 (Tata Communications) and reach external networks suggests that RPKI validation wasn't being enforced strictly in this case. This can happen for various reasons—some networks don't implement RPKI validation, others may not enforce rejection of invalid routes, or there may be policy exceptions for certain business relationships. Cloudflare has indicated they are following up with Tata Communications regarding this matter.
Resolution: Routes Restored
Cloudflare resolved the incident by addressing the underlying configuration issue. At 22:17 UTC, they identified the problem affecting their BGP announcements and implemented a fix. ThousandEyes data confirmed that AS13335 (Cloudflare's autonomous system) reappeared as the origin for both affected prefixes two minutes later, indicating the issue had been resolved.
With legitimate routes restored, BGP's path selection algorithm once again favored Cloudflare's announcements over any other advertisement for the same prefixes. The timing of service restoration, which aligned precisely with Cloudflare's fix, suggests AS4755's routes returned to their previous non-preferred state through normal BGP path selection processes as Cloudflare's legitimate routes became available again. Connectivity and paths to the DNS resolver prefixes were restored as routing tables converged around the restored configuration.
Key Takeaways: Lessons for NetOps Teams
This incident revealed several important characteristics of Internet infrastructure that have direct implications for enterprise network operations. The combination of latent routing configurations, incomplete security enforcement, and service dependencies created conditions that affected organizations relying on Cloudflare's DNS services, highlighting valuable insights for enterprise network operations and infrastructure planning teams:
- Configuration Management and Legacy Systems Can Create Hidden Risks: This incident demonstrates how legacy systems and configuration management processes can create dormant issues that manifest later. The root cause was a configuration error introduced on June 6, 2025, that remained dormant until it was triggered by an unrelated change on July 14. Enterprise networks should implement rigorous configuration management processes, comprehensive testing of changes, and consider the risks inherent in maintaining parallel legacy and modern systems. Regular audits should include identifying and removing dormant configurations that could interfere with primary services during disruptions.
- Network Issues Rarely Have Simple Root Causes: When multiple systems experience problems simultaneously, the most obvious symptom may not indicate the actual source of the issue. Enterprise teams should investigate all affected services and network segments to understand the complete scope of an incident before implementing fixes, avoiding spending time addressing effects rather than causes.
- Security Controls Matter, Even for Trust Relationships: Internal network segments or trusted partner connections may have relaxed security policies compared to external interfaces. Organizations should ensure that security validation mechanisms, such as certificate verification, access controls, and monitoring, apply consistently across all network boundaries, regardless of the trust relationships.
- Redundancy Systems Need Regular Testing: While backup systems and failover mechanisms are essential, they can produce unexpected behavior if not properly tested under realistic failure conditions. Enterprise networks should regularly validate how backup routes, secondary DNS servers, and failover systems behave when primary services become unavailable. This includes testing diverse DNS resolver providers rather than relying on a single service. Organizations dependent solely on 1.1.1.1 experienced service disruption during this incident, while those using multiple DNS providers from different organizations (such as combining Cloudflare, Google, and Quad9 resolvers) maintained connectivity through alternative paths. Regular failover testing should verify that backup DNS configurations activate smoothly and that network clients can seamlessly switch between providers during outages.
Previous Updates
[July 14, 2025, 10:30 PM PT]
Expanding on previous reporting, the Cloudflare incident began around 21:53 UTC, when ThousandEyes observed significant BGP churn of prefixes 1.1.1.0/24 and 1.0.0.0/24, which Cloudflare uses for its public DNS service. This routing instability is indicative of service providers hunting for a viable route, possibly due to Cloudflare no longer advertising those IP ranges.
As routes to Cloudflare’s DNS service cycled across global BGP routing tables, an illegitimate route for 1.1.1.1 — originated by Tata Communications (AS4755) — began appearing in some service provider routing tables. While this advertisement constituted a hijack, it was likely only propagated by other service providers in the absence of another viable route and was not the root cause of the incident.
Why a service provider such as Tata would advertise the prefix 1.1.1.0/24 may be due to the history behind the 1.1.1.1 address, as it was used for testing and internal configurations over many years prior to its allocation to Cloudflare. This is a unique condition to this prefix and not seen for other prefixes such as the secondary prefix 1.0.0.1 used by Cloudflare’s DNS service.
Explore the outage in the ThousandEyes platform (no login required).
[July 14, 2025, 6:30 PM PT]
On July 14, at approximately 21:53 UTC, ThousandEyes detected routing instability affecting Cloudflare’s public DNS resolver 1.1.1.1. The issue occurred when Tata Communications (AS4755) announced the IP address block (1.1.1.0/24) that belongs to Cloudflare, resulting in a hijack that disrupted website access for some users. The issue was resolved at 22:19 UTC when Tata’s AS4755 withdrew the announcement of the 1.1.1.0/24 prefix and valid routing resumed.