INSIGHTS
Delivering assurance at the speed of AI

The Internet Report

Looking Ahead: 2026’s Biggest Outage Risks

By Mike Hicks
| | 13 min read
Internet Report on Apple Podcasts Internet Report on Spotify Internet Report on SoundCloud

Summary

Dive into the emerging outage trends and considerations for ITOps teams to take into 2026 and beyond.


This is The Internet Report, where we analyze outages and trends across the Internet through the lens of Cisco ThousandEyes Internet and Cloud Intelligence. I’ll be here every other week, sharing the latest outage numbers and highlighting a few interesting outages. As always, you can read the full analysis below or listen to the podcast for firsthand commentary.

Internet Outages & Trends

For as long as we’ve been covering outages on the Internet Report, we’ve almost always identified a root cause: a misconfiguration, human error, a power failure in a data center. Something normally breaks.

However, some of 2025’s most notable outages didn’t follow that pattern. In several of the cases we covered, nothing was misconfigured, no systems failed in isolation, no power went out. Everything worked exactly as designed—the problems arose when systems interacted with one another in unexpected ways.

That distinction matters, because as we look into 2026 and beyond, we’re seeing a very different risk emerging: not broken systems but interacting ones. As autonomous agents become even more prevalent inside our infrastructure, that risk is about to be amplified.

Read on to learn more or use the links below to jump to the sections that most interest you:

Key Learnings to Carry into 2026

Let’s briefly review three of the notable outages we witnessed in 2025.

  • October’s AWS DynamoDB outage was the consequence of two independent DNS management components creating a failure state, despite each operating correctly within their own logic. One component applying an older DNS plan was delayed; another component applied a newer plan and triggered cleanup. The delayed component overwrote the newer plan at the precise moment cleanup deleted it. Neither component malfunctioned; it was the timing of their interaction that created the issue.

  • Another October issue with Azure Front Door had similar characteristics. A control plane component created faulty metadata. Automated systems detected and blocked it correctly. The cleanup operation—also functioning as designed—triggered a latent bug in a different component. Again, no individual component failed, everything was doing its job. The issue stemmed from their interaction.

  • Finally, a November problem with Cloudflare’s bot management system saw a configuration file exceed its hard-coded limit. The system generating configuration files was behaving correctly, according to its logic. The proxy attempting to load the resulting file also operated correctly, enforcing the limit, as designed. The failure struck when the first system’s output exceeded the second’s constraints.

All three of these major incidents were the result of interaction failures: systems operating within their own domain but creating states that none of them were designed to handle when they combined in specific ways.

The Rise of the Agents—and New Challenges

Another variable is going to increase the risk of these interaction failures in 2026 and beyond: autonomous agents.

Agents are already prevalent through today’s infrastructure: auto-scalers adjusting capacity, remediation tools restarting services, AIOps platforms deciding which alerts matter and what actions to take.

What’s changing is their scope and sophistication. Organisations are moving from narrow, single-purpose automation to agents with broader authority and capabilities. More agents, more autonomy, more consequential decisions. All operating simultaneously on shared infrastructure.

The proliferation of agents running across company networks creates new technical problems that ITOps teams haven’t faced before. These include:

  • Cascading failures – Human operators work slowly compared to automated agents. Humans can take minutes or even hours between configuration changes; agents make decisions in milliseconds. When other agents make near-synchronous decisions based on the behavior of other agents, this creates the potential for mistakes to propagate widely before degradation becomes apparent. Diagnosing the root cause of the initial failure is also more difficult, because so much can change in a very short period of time.

  • Optimization conflicts – One agent is designed to optimize performance, another is designed to reduce operational costs, a third is tasked with improving reliability and redundancy. All three are rational uses for agents, but they can easily fall into conflict. Human operators can juggle these competing objectives with their own experience and judgement, agents don’t. They can react instantly, repeatedly and without an intuitive sense of “this is getting worse.”

  • Intent uncertainty – When one agent makes a change, other agents observing that change must decide whether that was intentional or accidental. Get that wrong and agents can end up undoing each other’s work, creating the very network oscillations they were designed to prevent. Humans can resolve such conflicts by communicating with colleagues; agents don’t have that capability.

Why Visibility Becomes Critical

The increased potential for agents to act autonomously and create unexpected consequences underlines the need for comprehensive visibility across service delivery chains.

Cisco’s recent overhaul of its internal network highlights how increased use of agents must be married with increased observability. Cisco not only improved visibility through a combination of its solutions—including ThousandEyes technology—but it has also enabled engineers and service owners across IT to define and customize their own alert rules via GitOps.

We found observability was improved by feeding telemetry data and incident outcomes into large language models and automated systems, meaning millions of daily alerts could be prioritized to improve incident response times. This has resulted in earlier detection of potential issues before they escalate.

This approach works when you control and can instrument your infrastructure. The challenge intensifies when failures span systems you don't own—SaaS providers, cloud services, third-party dependencies where you can't add instrumentation. Understanding interaction failures across these boundaries requires visibility into the full service delivery chain, including dependencies beyond your direct control.

The Shape of Future Outages

In 2026 and beyond, autonomous agents will be managing more infrastructure. This is beyond doubt. The organizations that avoid some of those interaction failures we saw in 2025 will be those that treat agent co-ordination as a first-class design concern; build instrumentation that captures agent decision-making, not just outcomes; and have comprehensive visibility across their entire service delivery chain to detect issues before they become a crisis.

By the Numbers

Let's close by taking our usual look at some of the global trends that ThousandEyes observed across ISPs, cloud service provider networks, collaboration app networks, and edge networks over recent weeks.

Global Outages 

  • From December 15-21, ThousandEyes observed 299 global outages, representing an 18% decrease from 364 the prior week (December 8-14).

  • During the week of December 22-28, global outages decreased 23%, falling to 231.

  • Following the holiday period, outages decreased further to 199 during the week of December 29-January 4, marking the lowest weekly total in the observed period.

  • During the week of January 5-11, global outages increased 28%, rising to 255 as network activity resumed following the New Year holiday.

  • During the week of January 12-18, global outages increased 3%, rising to 263.

Over the three-week period from December 22 through January 11 spanning the holiday season, outages remained notably suppressed, which may reflect reduced maintenance activity and lower overall network changes during this period.

U.S. Outages

  • The United States saw outages decrease to 185 during the week of December 15-21, representing a 2% decrease from the previous week's 188.

  • During the week of December 22-28, U.S. outages decreased 46%, falling to 100.

  • During the week of December 29-January 4, U.S. outages decreased an additional 29%, falling to 71—the lowest weekly count in the observed period.

  • During the week of January 5-11, U.S. outages increased 90%, rising to 135 as network operations resumed post-holidays.

  • During the week of January 12-18, U.S. outages increased 10%, rising to 149.

The pronounced dip during the final weeks of December may align with typical holiday-related change freeze periods, when organizations often minimize network modifications to reduce risk during critical business periods.

Month-over-month Trends

  • Global network outages increased 178% from November to December 2025, rising from 421 incidents to 1,170. This surge reversed the downward trend observed between October and November.

  • The United States showed an even more pronounced 284% increase, with outages rising from 153 in November to 587 in December.

  • In December, the United States accounted for 50% of all observed network outages, compared to 36% in November, returning closer to the 58% proportion observed in October. The November dip appears to have been a temporary suppression associated with the Thanksgiving holiday period.

The sharp increase in December outages may represent a notable shift in operational patterns compared to what we have historically observed in our data. In prior years, network operators have tended to defer maintenance during the November-December holiday period, leading to elevated activity in January as accumulated work gets addressed.

The December 2025 spike, combined with the sustained suppression of outages during the late December/early January holiday weeks, could suggest that some network operators may have opted to complete maintenance activities before the holiday period rather than after. This proactive approach, if indeed occurring, would help avoid the concentration of maintenance work in January and reduce the risk of cascading failures that can occur when multiple deferred tasks are executed simultaneously in a compressed timeframe.

Global and U.S. network outage trends over eight recent weeks
Figure 1. Global and U.S. network outage trends over eight recent weeks

Subscribe to the ThousandEyes Blog

Stay connected with blog updates and outage reports delivered while they're still fresh.

Upgrade your browser to view our website properly.

Please download the latest version of Chrome, Firefox or Microsoft Edge.

More detail