Live Webinar
Troubleshooting Digital Experiences Across Owned and Unowned Networks

The Internet Report

Why NetOps Is MVP in Sports; Plus, Microsoft & LinkedIn Issues

By Mike Hicks
| | 15 min read
Internet Report on Apple Podcasts Internet Report on Spotify Internet Report on SoundCloud

Summary

This week, we’re unpacking issues at Microsoft and LinkedIn, and chatting about how to assure great digital experiences at major sporting events.


This is The Internet Report, where we analyze recent outages and trends across the Internet, through the lens of ThousandEyes Internet and Cloud Intelligence. I’ll be here every other week, sharing the latest outage numbers and highlighting a few interesting outages. This week, we’re also featuring a conversation exploring what it takes to deliver great digital experiences in the sports world, with special guest Dave Anderson, a tech industry veteran and co-host of "A Very Melbourne Podcast," which covers the Australian Football League and more. As always, you can read the analysis below or tune in to the podcast for firsthand commentary.


Assuring Great Digital Experiences for Sports Fans

Major sporting events are always logistically complex, but this is even more the case now that digital technology has permeated every part of operational and experience delivery. Venues are highly networked spaces, with everything from ticketing to hospitality services being run and managed digitally. Mobile wayfinding guides ticket holders to their seats; digital signage and displays beam game action to fans; and on-site or mobile production studios bring live feeds to TV and online audiences, domestically and (where licensed) internationally.

This end-to-end complexity, with its multiple dependencies and reliance on Internet infrastructure, can be challenging to oversee and manage. Networks, particularly those that are third-party operated, can be susceptible to a range of operating conditions that can affect the fan experience. 

When it comes to live sports, anything that does impact the digital experience is particularly problematic because the events occur in real time. The event, its audience, and the infrastructure that supports the experience delivery—both in person and at home—are all dynamic. Yet there’s only one chance to get it right, and so managing all the variables that contribute to the experience is absolutely critical.

Navigating a Big Year for Sports Events

Fans can have long memories when it comes to content delivery glitches that result in them missing an important moment like a goal or penalty being awarded. And there are—and continue to be—plenty of examples where broadcast issues still occur in the delivery of live sports into people’s homes. For example, a December boxing match experienced audio problems, and TV images didn’t display for part of an English representative soccer match this year. 

It’s not just elite-level sports that are impacted by outages. An issue with a grassroots sporting app on game day impacted community sports in Australia earlier this year.

With competition between sports for global audiences continuing to ramp up, the key to delivering for fans is to offer a glitch-free and consistent experience, no matter where the fan is: at a stadium, at home, in a car or airplane, in an office, or out-and-about. Governing bodies, broadcasters, streamers, and fans all want assurance that they’ll get the best experience every time they engage. 

For organizations in the digital experience delivery chain, more than ever it’s about having the ability to detect and remediate issues as they arise, and optimizing for every connected experience. 


Tune into the podcast for more from The Internet Report team and special guest Dave Anderson on assuring great digital experiences in the sports world.

Internet Outages & Trends

Returning to our regular outage programming, ThousandEyes observed two cloud-related incidents over the past few weeks: one where Microsoft ran into issues recovering from a DDoS attack, and another where “an issue with AWS” caused problems globally for cloud accounting software-as-a-service provider Xero. We also saw another recent Microsoft disruption that impacted LinkedIn. In addition, a major market sell-off that was triggered by events in the U.S. and Japan caused problems for some brokerage and trading platforms. We’ll unpack these below.

Microsoft Azure Services Disruption

Azure Front Door (AFD) and Azure Content Delivery Network (CDN), and downstream services that rely on them, were impacted by an outage on July 30 that reportedly started at 11:45 AM (UTC). However, ThousandEyes observed network issues before then, with parts of the Microsoft network seeing degradation between 10:30 and 11:00 AM (UTC).


Explore the Azure disruption further in the ThousandEyes platform, no login required.

Screenshot of ThousandEyes showing network disruptions
Figure 1. Network disruptions impacting reachability and performance of Microsoft Azure.

According to Microsoft’s official post-incident explanation, the problems began with a DDoS attack that was detected and automatically mitigated. But, once mitigated, default traffic routing did not resume as expected. This was due to a series of failures, beginning with a local power outage at “one specific site in Europe,” which caused traffic to continue to route through DDoS protection services. Complicating matters, “an unrelated latent network configuration issue caused traffic from outside Europe to be routed to the DDoS protection system within Europe. This led to localized congestion, which caused customers to experience high latency and connectivity failures across multiple regions.”

This is consistent with ThousandEyes’ observations. It was clear there were issues in how traffic was being redistributed following the DDoS mitigation, leading to congestion and dropped packets for customers. The issue was resolved by 2:00 PM (UTC), lasting a bit over two hours.

The incident illustrates that the cause of a disruption can be multifaceted, with multiple factors—including your own mitigation efforts—potentially playing a contributing role. When a disruption occurs, it's crucial to make sure any actions taken to remediate the issue are working as expected and not inadvertently making the issue worse.

Xero Outage 

On the same day as the Azure disruption, cloud accounting software provider Xero experienced a six-hour issue that prevented some customers from logging in or navigating the app. The outage resulted in a bad gateway (HTTP 502) error, indicating there was a problem with the communication between the CDN/proxy and backend systems. This type of error is classified as a server-side error and is usually observed when there are issues with receiving a response from the backend systems. In this instance, the backend systems were hosted on AWS. Xero has confirmed in status updates that the problems were "related to an issue with AWS."

Screenshot of Xero's status page
Figure 2. Xero’s status page acknowledged that some customers may be experiencing issues logging into or navigating Xero.

While ThousandEyes observed impacts on users globally, it was not a total outage–-although that does not necessarily provide any relief to those affected. That basically tells us that it wasn't a case of AWS being down—more that it was an issue with a specific service provided by AWS and leveraged by Xero for some services and functions. 

Screenshot of ThousandEyes showing errors
Figure 3. ThousandEyes observed Receive and HTTP errors during the Xero outage


Explore the Xero outage further in the ThousandEyes platform (no login required).

During the issue, ThousandEyes observed HTTP and Receive errors, suggesting that this was not a network issue and that the domain itself was reachable. When combined with an increase in page load time and the fact that only two web components loaded—which indicates that edge servers were reachable and responsive, but unable to load all required components—this further reinforced our opinion that  that the issue was with the backend.

Screenshot of ThousandEye showing increased page load time
Figure 4. ThousandEyes saw increased page load times for Xero.

On a side note, ThousandEyes also observed that some services appeared to operate normally, meaning that the functionality depended on where and how the user accessed the AWS region/network.

Screenshot of ThousandEyes showing an HTTP bad gateway error
Figure 5. Xero experienced a 502 HTTP bad gateway error, as shown in this screenshot from the ThousandEyes platform.

Microsoft Incident and LinkedIn Disruption

On August 5, some LinkedIn users around the globe encountered issues with the platform when Microsoft experienced an incident that impacted LinkedIn’s availability. First observed around 6:25 PM (UTC), the outage manifested as elevated packet loss in Microsoft’s network, as well as DNS resolution timeouts and HTTP errors. The incident also appeared to impact some of Microsoft’s other services, including Microsoft Teams and Microsoft 365.

Screenshot of ThousandEye showing elevated loss in Microsoft's network
Figure 6. Elevated loss was observed in Microsoft’s network when attempting to access LinkedIn.

The disruption to LinkedIn lasted for just over an hour. In a status update, LinkedIn confirmed that users were able to reconnect to its service by approximately 7:40 PM (UTC). ThousandEyes observed some lingering network latency issues after the reported resolution. However, these issues did not appear to prevent users from interacting with LinkedIn services, and they eventually resolved around 10:30 PM (UTC). 


For further insights about this incident, see this dedicated outage analysis blog.

Screenshot showing performance impact on LinkedIn
Figure 7. The disruption for LinkedIn lasted just over an hour, with residual performance issues observed until around 10:30 PM (UTC).

Brokerage, Trading Platform Issues

A number of online trading platforms used by retail investors experienced issues on August 5, coinciding with a major stock self-off across global markets. 

Charles Schwab confirmed it was among the operators to have problems. “A technical issue experienced by some clients has been resolved,” it said on X. “We apologize for the inconvenience.” Vanguard and Fidelity Investments were also reportedly  impacted, with regulators observing proceedings.

Large events on financial markets have always had the potential to impact trading platforms—and that goes for regular stocks, as well as cryptocurrencies during the height of interest in digital currencies a few years ago. The financial sector is inherently complex and its digital operations are no exception; it’s worth taking note of recent guidance in this space aimed at optimizing customer’s digital experiences and mitigating the effects of disruptions.


By the Numbers

Let’s close by taking a look at some of the global trends ThousandEyes observed across ISPs, cloud service provider networks, collaboration app networks, and edge networks over the past two weeks (July 22 - August 4):

  • The upward trend observed across July continued into the first week of this period (July 22-28), with outages increasing by 9% compared to the previous week, rising from 187 to 204. However, this upward trend came to an end the following week, with outages decreasing from 204 to 183 between July 29 and August 4, marking a 10% decrease compared to the previous week.

  • The United States did not reflect this trend. Instead, the increases observed throughout much of July ended in the first week of this period (July 22-28), with outage numbers decreasing 34%. However, the upward trend resumed the next week, with outages rising 28% from July 29 to August 4.

  • Despite this rise in outages in the United States during the second week of the period, U.S.-centric outages made up less than 40% of all observed global outages. From July 22 to August 4, only 35% of network outages occurred in the United States, compared to 48% in the preceding two weeks (July 8 to 21).

  • Looking at the month-over-month data, in July, 816 outages were observed worldwide, an 8% decrease from the 890 reported in June. However, there was a slight increase in outages in the United States, rising from 308 in June to 334 in July, marking an 8% increase.

Bar chart comparing global and US outages over past eight weeks
Figure 8. Global and U.S. network outage trends over the past eight weeks.

Subscribe to the ThousandEyes Blog

Stay connected with blog updates and outage reports delivered while they're still fresh.

Upgrade your browser to view our website properly.

Please download the latest version of Chrome, Firefox or Microsoft Edge.

More detail