The Problem: Alert Triage at Scale
Alert fatigue is a well-known problem in network monitoring. As the number of measurement targets and vantage points increases, alert volume grows proportionally. A deployment that monitors 13 root DNS servers from 42 vantage points with a two-minute interval generates roughly 69 alerts per day. At that rate, individual reviews are impractical. But which alerts are actionable? Which are noise? And what is the dominant source of that noise?
Alert analysis is inherently exploratory—each finding raises the next question. With traditional tooling, every iteration means writing a new script. A thorough investigation can stretch over weeks—and even when completed, the resulting scripts are hard to revisit when the deployment changes, or new questions arise.
Alert Triage via MCP
The Cisco ThousandEyes MCP Server exposes ThousandEyes data to AI agents through the Model Context Protocol (MCP). The agent can retrieve alerts, network and application metrics, hop-by-hop path traces, BGP routing data, ThousandEyes Internet Insights, and run instant tests. With MCP, the agent maintains context across the conversation, and follow-ups are immediate—the investigation builds momentum instead of losing it. You start with a broad question (“What does the alert stream look like?”), and each answer raises a more specific one.
We used the ThousandEyes MCP with an AI agent in Cursor to analyze three months of root DNS monitoring data. The agent handled the mechanics, while we directed the investigation. The investigation proceeded iteratively from raw alert retrieval through per-target profiling, multi-target correlation, and filtering trade-off quantification. The complete analysis took four interactive sessions of two to three hours each.
What We Found
The chart shows a persistent baseline of roughly 40–80 alerts, with spikes above 100. Bars are stacked by how long alerts lasted before clearing—mostly transient.
The dominant source of alert volume was coincidental co-firing: independently noisy vantage points triggering alerts by chance at the minimum two-vantage-point threshold. This is a structural property of large deployments. With 42 vantage points and measurements every two minutes, independent intermittent issues coincide regularly enough to trigger alerts at the minimum threshold.
The number of affected vantage points turned out to be a strong persistence predictor: at the two-VP baseline, only 4% of alerts last more than 30 minutes; at five or more VPs, 34% do. A combined filter (three or more affected vantage points, five or more minutes duration) reduced volume by 90%—from 69 alerts per day to seven—while retaining half of all persistent alerts.
Beyond Alert Triage
Alert triage is one application, but the underlying pattern—iterative, question-driven exploration of monitoring data—applies broadly. Incident investigation, capacity planning, SLA monitoring, and baseline characterization all follow the same structure: start with a broad question, drill into the data, and let each answer guide the next step. The ThousandEyes MCP Server provides the same interactive access to metrics, path traces, BGP, and Internet Insights for any of these workflows.
To get started, see the ThousandEyes MCP Server documentation. For a deep dive into our alert analysis, see the companion engineering blog post: “Triaging Alerts in Large-Scale DNS Monitoring with an AI Agent.”
Cursor is a trademark of Anysphere, Inc. All other third-party trademarks mentioned are the property of their respective owners.