This is The Internet Report, where we analyze outages and trends across the Internet through the lens of ThousandEyes Internet and Cloud Intelligence. I’ll be here every other week, sharing the latest outage numbers and highlighting a few interesting outages. This week, we’re covering three key outage patterns we’ve observed in the first half of 2025—and what they reveal about the current outage landscape. As always, you can read the full analysis below or listen to the podcast where we cover these outage patterns further.
Three Outage Patterns We’re Watching in 2025
Our analysis of outages from January through June 2025 showed these failure patterns happening more frequently. These patterns aren't new—they've existed since applications shifted from monolithic to distributed architectures—but as applications continue to evolve and scale with more specialized services and components, we're seeing more of these types of disruptions with more far-reaching consequences.
These patterns help explain why outage symptoms often appeared disconnected from their root causes:
- Unintentional Failure Vectors: Systems architected to work together accidentally spread failures, with service features built for coordination and replication becoming pathways that distributed problems.
- Hidden Functional Failures: Systems appeared completely healthy while certain functions silently broke, with system metrics appearing to be normal while users lost the ability to execute certain functions.
- Configuration Cascade Effects: Well-intentioned configuration changes in one system component cascaded through interconnected systems, causing failures in seemingly unrelated areas.
Read on for a deep dive into these patterns, how they manifested in various case studies, and key considerations for IT operations teams seeking to maintain quality digital experiences in today’s landscape.
Unintentional Failure Vectors
The first pattern showed how service features designed for coordination and control became pathways that spread failures. When these features encountered unexpected conditions, they inverted their purpose, causing mechanisms like access control systems and data replication to accidentally distribute disruption.
The June 12 Google Cloud incident exemplified this pattern. Google's Service Control system, designed to control access to Google Cloud services, became a global chokepoint when corrupted policy data impacted the Spanner tables it depended on. Spanner's real-time replication feature—designed for data consistency—propagated the failure to every region within seconds, creating a cascade with global impact.
This failure pattern was observed in other instances too. Azure's sophisticated content delivery networks with global load balancing, Cloudflare's "coreless" services designed with no single points of failure, and Slack's globally replicated databases all demonstrated the same phenomenon: Tightly interconnected systems create failure amplification points that, when problems occur, can cascade globally due to tight coupling and dependency chains.
The underlying challenge stems from applications continuing to evolve into more granular, specialized services. In the outages we analyzed, these more complex dependency chains made it harder to connect symptoms with their root causes. We observed failures affecting specific functional areas while leaving broader system indicators appearing normal, making the connection between the cause and effect less obvious.
Hidden Functional Failures
The second pattern involved systems appearing completely healthy while certain function failures went unnoticed. System metrics appeared normal, but problems only became apparent if users attempted specific tasks.
Slack's February 26 incident lasted over nine hours, exemplifying this pattern. Network connectivity, frontend response, and load balancers all appeared healthy, yet core functionality was broken. The May 12 Slack outage demonstrated another dimension: Users could log in, browse channels, and navigate the interface, but messages weren't sending. Unless users expected responses they didn't receive, they likely remained unaware of the outage.
These functional failures create what could be called "experience debt"—users may attribute poor performance to general app quality rather than recognizing an active outage. Unlike obvious failures where customers clearly identify service problems, these degradations can accumulate undetected, undermining user experience without necessarily triggering alerting mechanisms.
Root causes often lay buried in nested dependency chains completely disconnected from symptom locations. Frontend issues might stem from database optimizations three layers deep in the dependency stack, with failures propagating through caching layers and load balancers before manifesting as user-facing problems.
Configuration Cascade Effects
The third pattern involved well-intentioned changes that trigger unexpected consequences in seemingly unrelated parts of the system.
These configuration-related outages weren't failures of testing or negligence, but unintended consequences of localized optimizations. The underlying challenge stems from "component myopia"—teams expertly optimizing within specific domains while lacking visibility into how changes ripple through broader ecosystems.
Asana's consecutive incidents on February 5 and 6 both stemmed from configuration changes that were logical within their specific contexts. However, these changes cascaded through nested dependency chains, ultimately manifesting as infrastructure-wide failures that appeared completely unrelated to the initial updates.
In agile development environments with numerous specialized components developed by different teams across mixed infrastructure, thoughtful configuration changes in one module can propagate through dependency chains, creating domino effects in seemingly disconnected areas.
Why These Patterns Emerged
Several factors may help explain the prevalence of these patterns:
- Architectural Evolution: Applications are increasingly evolving into larger collections of specialized services that can be dynamically added, changed, and connected for optimized performance. As distributed systems become even more distributed with additional services and connections, this creates increasingly complex coordination requirements between service components and interdependent connectivity meshes.
- Specialization Challenges: Unlike traditional cloud architectures with fungible components (redundant load balancers, compute services), today's functional services are highly specialized. Services like Google Maps or specialized APIs can't easily be replaced with secondary functions if the primary services fail.
- Symptom-Cause Disconnection: In the incidents we analyzed, healthy system metrics didn't always ensure functional services were working properly, and root causes often appeared completely unrelated to where problems manifested.
For more insights on recent outages, see all the analysis from the ThousandEyes Network Intelligence and Research team. Or check out the ThousandEyes Internet Outages Map for a real-time view of global disruptions.
By the Numbers
The Evolving Geography of Network Outages
From January to June 2025, the geographic distribution of network outages exhibited an interesting shift in regional activity, with the percentage of U.S.-centric outages fluctuating from the trend observed for much of 2024. While in the past, U.S.-centric outages typically accounted for at least 40% of all global network outages, early in 2025, the numbers rose significantly, peaking at 55% from January 27 to February 16. However, this percentage then gradually declined throughout the first half of the year, dropping to 46% by early March, 41% by early April, and reaching as low as 24% during certain periods in May, before ultimately settling at 39% by the end of June.
The month-over-month data reveals the dynamics driving this shift. Global outages increased from 1,382 in January to 1,595 in February, marking a 15% rise, and continued to climb to 2,110 in March, a 32% increase. April saw a decline to 1,804 outages, followed by a modest increase to 1,843 in May before declining to 1,219 in June. During the periods of growth, the composition of these increases changed over time. From January to February, U.S. outages increased from 657 to 811, a 23% rise, while global outages grew by 15%, suggesting that engineering work in the U.S. may have been driving the initial rise. However, from February to March, the trend reversed: U.S. outages increased by only 11% (from 811 to 901), while global outages surged by 32%, suggesting that other regions contributed to the larger global increase.
By June, both global outages (1,219) and U.S. outages (478) had significantly declined from their peaks, with the U.S. share settling at 39.2%, below the historical baseline of 40%. This suggests that the surge in potential engineering work, which peaked in March, had largely stabilized across all regions by mid-year. The overall pattern indicates that early 2025 experienced a post-holiday deployment surge in the U.S. markets, followed by disruptions in Europe, Asia, and other regions, before the global engineering work cycle stabilized by June.