OUTAGE ANALYSIS
Cloudflare Outage Analysis: November 18, 2025

Product Updates

The Promise of Great Digital Experiences Across Always-on AI Agents

By Joe Vaccaro
| | 8 min read

Summary

In a world where the network never sleeps, agents never stop, and every interaction matters, end-to-end assurance becomes a business imperative.


We've seen this movie before. The Internet, SaaS, cloud, mobility—each wave of digital transformation brings new complexity, new dependencies, and new challenges to how we deliver and assure great user experiences. And each time we adapted, learning to manage what we couldn't fully control, to see and understand what we didn't own, and to optimize outcomes we couldn't always predict.   

Now we're witnessing another transformative shift driven by agentic AI: Autonomous digital entities that operate, make decisions, and execute transactions independently. This isn't just a new workload; it's a fundamental shift in how digital systems operate, interact, and deliver value. 

At Cisco ThousandEyes, our mission has always been to assure exceptional digital experiences for every user by delivering digital resilience across every domain of a connected experience, both owned and unowned. As machine-speed interactions redefine connectivity, end-to-end assurance is no longer just a best practice. It's a business imperative. 

Redefining Great Experiences in the Age of AI Agent Systems

I use ChatGPT almost daily, and I've grown accustomed to waiting for an answer. My expectation isn't speed; it's the quality and trustworthiness of the response. This shift matters. 

Historically, network performance was measured by speed and availability for predictable, human-driven usage patterns. But as AI agents become primary users, executing decisions at machine speed, we need to redefine what constitutes a great experience. The question isn't just "how fast?" but "how reliable, how accurate, and how trustworthy?" 

Unlike human-generated traffic, AI agents initiate rapid bursts of API calls, aggregate data from multiple sources, and execute complex processes in milliseconds. Picture a single AI agent booking travel: It might simultaneously query flight APIs, hotel databases, weather services, and payment processors, with each dependency critical to the transaction. 

When one agent encounters an issue, it can trigger a cascading effect across other agents, resulting in poor responses or unmet expectations. A millisecond delay or corrupted data can halt business when AI agents are in the driver's seat. 

AI agents spin up, scale out, and interact in real time, creating a living web of dynamic dependencies that challenge traditional network management. This raises a crucial question: What is the service level expectation (SLE) for agent-to-agent systems?  While traditional service level agreement (SLA) metrics like availability and latency remain important, SLEs add a new dimension: measuring whether workflows complete successfully, whether data is valid, and whether all dependencies respond correctly—essentially, did the agent accomplish its intended business outcome? 

Even minor disruptions can cascade through automated processes with significant business impact. To help ensure quality of service matches the criticality of agent-driven workflows, every link in the service chain must be observable and manageable, even as these chains shift dynamically in response to agent logic and external factors. Success will increasingly be measured by the reliable completion of workflows aligned with business intent, not just raw throughput or latency.

Navigating Dependencies Upon Dependencies

Traditional systems operate with predictable relationships: Service A calls Service B in a known sequence. But AI agents create dynamic, context-dependent relationships that change with each task, producing non-deterministic infrastructure dependencies and unpredictable failure modes. 

Here's the challenge: You're not just taking on a dependency on a provider; you're also exposed to the dependencies of their providers. And those dependencies are constantly in flux as providers adjust their infrastructure to keep pace with evolving innovation in the market. A change several layers down can ripple through and impact the digital experience.

Context Is King: Top to Bottom, End to End

Delivering great experiences in an agentic world requires understanding context, not just end to end across the service chain, but top to bottom through every layer of the stack.

This mirrors challenges we've seen before in distributed applications. With the proliferation of agents operating on our behalf, it's inevitable that these agents must be brought into Zero Trust frameworks. But here's the operational challenge: As teams across NetOps, SecOps, and DevOps work on complex systems—spinning up agents, applying security policies, or pushing updates—the final delivery of the digital experience remains an end-to-end responsibility.

When you see performance degradation, you need to understand whether it's because someone spawned a new agent without proper security policies, security policies changed and blocked an expected action, a downstream dependency failed, or network conditions degraded. You need context. 

End-to-end Assurance for the Always-on AI Era

Our customers have long used Cisco ThousandEyes to monitor end-to-end transactions to help assure connectivity and great digital experiences. Now we need to expand our thinking about transactions. It’s no longer just humans talking to machines. Now we need to account for agents talking to other agents across networks we don't control, as well as agents relying on external Model Context Protocol (MCP) servers for tools and context. 

Imagine distributed tracing, but for multi-agent systems. Not just tracking a user's journey through an application, but understanding how agents interact—including network behavior, tool call timing, and infrastructure metrics. 

This is where Cisco's integrated approach delivers significant value. Recent innovations like the distributed tracing integration between Cisco ThousandEyes and Splunk demonstrate how we assure user experiences end to end. As agent-driven systems become more prevalent, this capability will need to extend to understanding agent-to-agent calls and third-party dependencies.

Cisco is uniquely positioned to deliver this—not as overlapping point solutions but as integrated digital resilience that adapts as agent architectures mature. 

At Cisco ThousandEyes, we're committed to helping our customers thrive in this new era—because digital resilience now means end-to-end assurance for a world where the network never sleeps, agents never stop, and every interaction matters.


Read more: Monitoring AI Agents for Production Reliability 

Subscribe to the ThousandEyes Blog

Stay connected with blog updates and outage reports delivered while they're still fresh.

Upgrade your browser to view our website properly.

Please download the latest version of Chrome, Firefox or Microsoft Edge.

More detail