Understanding the Meta, Comcast, and LinkedIn Outages


Solving the Visibility Gaps AIOps Leaves

By Alex Henthorn-Iwane
| | 15 min read


In a previous post, we explored what AIOps is and more importantly, what the point of AIOps is. Namely, that AIOps really is about delivering insight into digital experience and why digital experience is breaking. In this post, we’ll explore the data visibility gaps AIOps doesn’t address in and of itself, and how we set about solving that gap in a manner that is complementary to AIOps.

Do You Have the Right Data Set?

If we believe that experience is vital, then how does one solve for monitoring with that in mind? It’s undoubtedly paramount to get the right data. So, it stands to reason that the new, experience-oriented paradigm means that your Ops visibility should address all the elements that impact digital experience and get at the “why” when digital experience breaks. That means it should include digital experience monitoring (DEM) such as HTML server availability and response time, page load and web transaction data. But just knowing that a website or a service (such as a Salesforce API endpoint) is not responding in a timely manner isn’t enough, because the Ops team still needs to know why so it can solve the underlying problem. In my conversation with another industry analyst, he mentioned that there is a great deal of dissatisfaction with traditional DEM because of the lack of ability to understand the why of experience issues in this modern, cloud and Internet-dependent era. So clearly, DEM solves some of essential challenges, but more is needed.

This is where we can observe a significant gap in most IT Ops visibility architectures. The vast majority of IT Ops visibility is based on passive data collection from the pieces of the puzzle it still has control over. For example, up/down indicators plus coarse to fine detailed measurements from the network elements you still own and VMs in your public cloud VPC that you can directly administrate; plus EUE data from the app code that you develop. That’s helpful data for sure. But what about all myriads of external apps, services, infrastructure and Internet networks that IT has no direct control over? For example, you can’t inject APM code into a SaaS provider’s software, and you can’t gather infrastructure data from a network that doesn’t belong to you. But all that external stuff is now the majority of the IT picture. To quote the summary from the Gartner research report “Prepare for the Death of the Data Center as We Know It,” by Tiny Haynes:

“Gartner anticipates that many leading enterprises will migrate entirely away from their on-premises data centers with the current trend of moving workloads to colocation, hosting and the cloud. With this evolution, I&O leaders must rely on outside partners.”

If you don’t have data on that vast part of the current IT landscape, the size of your data gap means that no level of analytical intelligence can make up for it.

Mind the gap
Figure 1: No amount of analytical intelligence can make up for a significant gap in visibility data.

Let’s go a little deeper on this data gap issue. One constant literally links nearly every endpoint (PC, tablet, mobile), every cloud provider, every SaaS, every data center and nearly every branch office together today—the Internet. The Internet consists of tens of thousands of networks stitched together by goodwill and voluntary participation in a grand routing scheme enabled by the Border Gateway Protocol (BGP). Practically speaking, when a user in a branch office communicates with a SaaS like Salesforce, it can easily traverse several networks—the internal branch network, multiple ISPs, possibly a cloud security provider like ZScaler, and then the SaaS provider network. Most front-end user connections and most back-end inter-service communications now depend on the Internet in some fashion.

So, to adequately understand and link DEM indicators to underlying infrastructure and especially networks today, you simply must have an in-depth operational view into the Internet. As mentioned in the previous post, understanding and managing an enterprise network is more complicated than fleets of servers or VMs. Now, let’s talk about trying to understand and manage the Internet—which is the biggest living IT organism in history. It’s a moving, evolving target. If an IT Ops team can’t understand what’s happening in this domain, then its ability to understand the why behind digital experience indicators going off the rails is going to be extremely limited.

Magically create data
Figure 2: AIOps doesn’t magically create the data you need to understand why digital experience breaks.

ThousandEyes Approach — An Experiential Lens on the Internet

As mentioned earlier, ThousandEyes is not an AIOps platform. Furthermore, as I hope it is clear, we don’t have any bone to pick with AIOps. We certainly believe that AIOps has relevance to modern IT Ops challenges, but it’s not a panacea. For example, AIOps doesn’t at its core address the visibility data gap around understanding the impact of the Internet and other non-IT-controlled assets on digital experience.

ThousandEyes took a SaaS approach to solve the modern “experience” visibility problem that our founders saw years ago, that every business was going to increasingly depend on the Internet to deliver digital experiences to humans and machines. In fact, we state our mission as seeking to help organizations see, understand and improve connected experiences everywhere.

How We Pulled DEM and Internet Visibility Together

To accomplish this mission, one of the things that we had to solve for was the massive data gap around understanding the Internet. So we created and patented unique ways to measure communications across any IP network path, inclusive of internal, MPLS, Internet and cloud and SaaS network infrastructures and gather per-node metrics so that it would be possible to know precisely why, from a network point of view, a digital experience was going south. Knowing that the Internet is this ever-changing organism that runs on autonomous, global routing protocol exchanges, we also paired our so-called Path Visualization with the industry’s most in-depth and most comprehensive view into Internet routing. We do this by collecting global data feeds and formulating a continuously updated full view of Internet routing so we could know if a path was having a problem because of Internet routing issues, and even find route hijacks or leaks. We went further by harnessing our massive multi-tenant data set to triangulate and automatically expose infrastructure problems in ISPs so that we could directly tell IT Op folks whether their digital experience problem was is caused by an Internet outage. On top of that, we paired all this data with DEM data from DNS, HTML, page loads, web transactions and patented cross-correlation visualization algorithms to make it easy for humans to see cause and effect from the user and app layer down to the root of Internet routing.

We leverage a global fleet of monitoring agents across the Internet and major public cloud providers, plus easy to deploy agents for enterprise IT teams to implement in their data centers, branches and VPCs to gather all this data. In other words, we created a unique data set that offers visibility into the previously uncovered and now majority world of external infrastructure, software and networks. This layered data from app down to Internet layers is critical to understanding the delivery of digital experience today.

Digital experience is everywhere
Figure 3: We’ve created a unique data set linking digital experience to deep Internet visibility.

We offer this solution as SaaS with compelling visuals and a rich API because we believe that for IT Operations visibility to be as productive as possible, it should itself be consumed as a digital experience, which ideally should involve as little deployment, setup and ongoing maintenance as possible. You can see an example of our visuals in this analysis of how an Internet routing issue impacted user access to Google Cloud. Like many SaaS, we pair our solution with a highly knowledgeable Customer Success team that helps our IT Ops customers to solve their business problems.

Key Use Cases

What are some key use cases where the layered combination of visibility from experience down to the Internet matters? The easiest to understand and most business-impactful one is customer-facing digital experience delivery such as e-commerce or online banking to users accessing those services from all over the Internet. A second use case is SaaS adoption, where employee productivity relies on connectivity from branches and remote locations across the Internet to applications like Office365, Salesforce and Webex. A third use case is WAN modernization, where organizations are shifting from traditional MPLS networks to hybrid and SD-WAN scenarios that depend on Internet transit and new cloud-based security providers to deliver better cloud performance while lowering WAN costs.

Both Platform and SaaS Approaches Have Merit

As mentioned earlier, Gartner articulates a definition for “AIOps Platforms,” and I think that’s for a reason. An AIOps platform is an ecosystem of technology built to consume, process and correlate many different streams of IT Ops visibility data to deliver meaningful answers to important questions. It is indeed possible to feed DEM and other monitoring data into an AIOps platform and tune it to give you answers that are relevant to business issues. The challenge is to get the right streams of data (as mentioned above) and to define how you want the AIOps engine to answer your questions (learning). The notion of a platform means that it is highly flexible, but that it probably is not the primary monitoring data generation or collection engine for the data sets. It also means that to get effective use from it requires either thoughtful internal solution architecture work, or a third party to provide pre-curated apps running on top of the platform. These are not bad things, they’re in fact common paths taken with most platforms of any sort.

SaaS like ThousandEyes are applications that are purpose-built to solve particular problems with data sets that are explicitly curated for those problems, producing human and machine-readable outputs designed to helpfully convey information for those problems via visualizations or APIs.

There is a case to be made for both platform and SaaS approaches to any business or IT use case. Specific to solving the experience-centric IT Ops visibility challenge, both methods are undoubtedly valid and complementary.

Learn More and Let’s Talk

If you’re trying to solve the experience-centric IT Ops visibility challenge in your organization, there are a lot of excellent research and information resources on our website and blog on digital experience delivery, SaaS adoption, public cloud connectivity, DNS, Internet routing, SD-WAN and other relevant topics. If you’d like to dig deeper into how your organization can solve a specific challenge around getting to the why of digital experience issues more effectively, contact us and let’s talk.

Subscribe to the ThousandEyes Blog

Stay connected with blog updates and outage reports delivered while they're still fresh.

Upgrade your browser to view our website properly.

Please download the latest version of Chrome, Firefox or Microsoft Edge.

More detail