In October 2017, we took ThousandEyes Connect to Chicago for the first time and were thrilled to host customer speakers from United Airlines and JLL. In this post, we will summarize the presentation by Brandon Mangold, Principal Operations Engineer at United Airlines.
In his talk at ThousandEyes Connect, Brandon walked through the United Airlines and ThousandEyes journey while highlighting the importance of correlating visibility across multiple networks and applications to manage a global network.
What does a Global Network Look Like?
Brandon kickstarted the session by giving the audience a glimpse of what a global enterprise network looks like. The United Airlines network is made up of 1000+ offices and over 400,000 employees accessing a myriad of applications for their day-to-day jobs. Apart from that, over 6M internet users visit united.com every day. The enterprise backbone comprises of seven global contact centers and three private hybrid cloud data centers, hosting a variety of business-critical applications. Brandon and his team are responsible for managing United’s expansive global network with 9000 interconnected devices and four major service providers fueling the connectivity to the data centers.
Trial Gone Wild. Wildly Successful.
Brandon was first introduced to ThousandEyes at Network Field Day 12 in August 2016. He confesses that he had initially assumed ThousandEyes to be a platform to monitor only external Internet presence. However, the session revealed that he could do a lot more with ThousandEyes. Brandon said, “I learned a lot more about the product, especially about the Enterprise Agent. That’s really what got me interested.” Fired with this newly acquired knowledge, Brandon was very excited to kickstart the free trial and see what ThousandEyes could do for them. He related, “We lit up a very basic demo, from a couple of Cloud Agents to start monitoring the external Internet links on our website.”
Within two weeks into the trial, Brandon and his team were called on a P1 incident. A severe outage was rendering a large portion of their dot-com and mobile app unavailable to users all around the world. “United customers were unable to check-in to their flight or make a reservation online”, he said. While the team was looking for hints within their CDN provider (Akamai) and the application itself, Brandon decided to look at ThousandEyes data for clues. Brandon recounted, “Within 15 minutes of digging around, I had visible proof that Level3, our upstream ISP was dropping a large amount of packets.” A major Level 3 outage was affecting availability to United’s online facing assets. But, a global network is built to be foolproof for these type of outages, so the question remained as to why their CDN load-balancing solution was not kicking in?
Brandon described that “Level 3 was dropping a large amount of packets, but not enough for our global Akamai load balancer to switch over.” The fluctuating packet drops within Level 3 just allowed the right number of keep alives to get through, tricking Akamai to not initiate a failover. Confident that the outage clearly was within the operating realms of Level3, Brandon submitted a ticket and did a manual failover at Akamai to force traffic to go through a clean link. He depicted the result: “We were back up and running in no time. ThousandEyes saved us more than an hour and we hadn’t paid a dime for it yet!”
During many instances in his presentation, Brandon emphasised the importance of having visibility into your networks and applications. He commented, “You can’t know what you can’t see. Before ThousandEyes I had zero visibility into upstream provider issues.”
Tackling Tricky VoIP Quality
After witnessing early success with the trial, Brandon decided to test the ThousandEyes VoIP functionality to tackle a pesky voice issue that had been haunting his team for a month. Multiple branch locations were experiencing a pronounced degradation in voice quality resulting in numerous IT tickets. Brandon explained that within 10 minutes of setting up an Enterprise Agent from two branch locations, he was able to narrow down the root cause. He says “I lit up a couple of Enterprise Agents and triggered a basic voice test, simulating Expedited Forwarding (EF) traffic. Within 10 minutes, we narrowed down the problem to a device within our internal MPLS network that was remarking VoIP EF traffic to best effort.” It turned out that their MPLS service provider had misconfigured QoS settings at the customer edge (CE) router that adversely affected VoIP packets (Figure 3).
The Flight Ahead: Integrating Network and Application Monitoring
“While the motivation for ThousandEyes was primarily to serve as a network monitoring platform and get visibility into upstream service provider networks and BGP changes, we are starting to see the possibilities it opens for application monitoring”, said Brandon. Actively monitoring HTTP application performance while keeping the network in perspective gives both network and application teams a common platform to rely on. Brandon explained, “We would like to bring teams together so we can have a common view of the intersection between network and applications.”
United is also extending their implementation of VoIP to include dashboards and reports that show deviation in MoS scores. Brandon added that he’s a big fan of standard deviation charts, as its the fastest way to identify anomalies.