Internet Insights: Application Outages Tutorial
Today, enterprises are consuming more SaaS applications than ever to accommodate their distributed workforces.
While these applications are often critical to business productivity, they're not controlled by internal stakeholders.
So when the app is unresponsive or the experience is degraded, it unleashes a sequence of unfocused troubleshooting cycles that consume valuable IT resources and extend the problem time.
This in turn impacts productivity across the organization.
ThousandEyes is leveraging its collective view of service delivery paths across the Internet to bring its customers Internet Insights with Application Outages.
Simply put, you specify a timeframe, and then we go out and scan the data that we have available, and we'll let you know if there are any service impacting issues both across provider networks that you either interface with or contract through.
As well as applications that are critical to your business's productivity.
Now if you see an outage occurring within a set window, you can then dive in and get a further detailed view.
In this case we see back on August 31 at around 2:00pm Eastern Time, an incident occur.
And this is a pretty long, sustained outage.
I can see that it's globally impacting and I see it's impacting a couple of my key apps, like Dropbox and PagerDuty.
Now if I want to understand scope, aka, is it specifically impacting me or is it just a global issue?
I can pivot over and track effect the test and see which of my test that I have running for my enterprise or cloud agents are materially impacted.
In this case, I'm a PagerDuty shop.
And I can see pretty continuously that my PagerDuty test is in fact impacted.
So I'll switch this back to the main outages view.
And my next question that I need to answer is, is there some commonality between the outages across these two different applications outside of the fact that it's global impacting?
So if I go here, I can actually switch the grouping for my targets, and find out what network they reside within.
And immediately I know, that is within IP blocks owned by AWS, which means these are hosted within Amazon AWS, ASN 16509.
I can see the relative impact across all of the servers that we happen to be testing to from ThousandEyes, I can see specific geo locations that are impacted, as well as the number of agents or testing points that were hurt.
And if I want to further confirm that this is in fact on the application side, I can simply hover over the endpoint on the far right, I can see that while there are some network timeouts.
We are seeing a significant number of HTTP server timeouts.
And as we rifled through, this information periodically updates.
So not only can I see what's happening during an outage, but I can see before and after.
Now if I want to confirm this as application side, I can just simply click on the Network Outages tab, and I see that for the majority of the outage that we see presenting itself at the application layer, there's almost no network data found, which means that the network was clean.
So now my team is able to take this information, click our Share button and generate a share link, and send this right over to the team that manages our AWS instance, as well as our contact at the application that we're contracted through.
So with ThousandEyes, you're able to rapidly isolate and escalate to the proper party to get the issue resolved.