Learn more about the latest ThousandEyes innovations at Cisco Live! | June 2-6, 2024

The Internet Report

Application Outages Up in 2023—What to Know

By Mike Hicks
| | 13 min read
Internet Report on Apple Podcasts Internet Report on Spotify Internet Report on SoundCloud

Summary

Though network outages are still far more common, application outages seem to be increasing in 2023—and having bigger impacts. Tune in to learn more about this trend and dive into incidents at Okta and Instagram.


This is the Internet Report: Pulse Update, where we analyze outages and trends across the Internet, from the previous two weeks, through the lens of ThousandEyes Internet and Cloud Intelligence. I’ll be here every other week, sharing the latest outage numbers and highlighting a few interesting outages. 

This week, we’re exploring outage trends from the first half of the year in this special episode, reflecting on the state of the Internet in 2023 thus far. As always, you can read the full analysis below or tune in to the podcast for first-hand commentary.


Internet Outages & Trends

Time truly does fly. Believe it or not, we’ve nearly reached the halfway point of 2023.

Looking at outage trends over the first half of the year, some interesting themes emerge. Outages continue to occur and in ever increasing numbers, with total observed global network outages continuing to rise in 2023 so far.

In addition, the number of distinct application outages observed appears to be increasing in the first half of 2023. While the number of application outages is still much lower in comparison to the number of network outages, the potential user impact of application outages is far wider. They often involve a degradation at a single point of aggregation or dependency in the service delivery chain, which can have broad consequences.

It’s also important to point out that the identification of a discrete application outage does not necessarily mean the whole service or application is experiencing an outage. It is, however, an indication that part of that application process is encountering errors, which can, in some cases, negatively impact the functionality of the application itself, for some or all users.

Read on to learn more about the increase in application outages and dive into two case studies of application-related incidents at Okta and Instagram, as well as explore other 2023 outages trends from the first half of the year. (Or use the links below to jump to the sections that most interest you.) And for more outage case studies from the past year, also check out this interactive Internet Outages Timeline.


By the Numbers

In the first half of 2023, outage numbers diverged from some of the patterns observed for the last few years, with total global outages, as well as ISP, CSP, and application outages all behaving a bit differently. However, other trends remained constant, with outages numbers continuing to rise overall.

Total Global Outages Continue Increasing, But Growth Trend Appears More Stable

While total global outages continued increasing, we observed less dramatic fluctuations in the number of outages seen in any given week. Since 2020, outage numbers have tended to swing high one week and then drop quite low another week. This year, the average number of outages observed week over week remained more stable.

Though the first half of 2023 has seen peaks and troughs patterns that appear to reflect seasonal patterns also observed in previous years, the actual trend, or average outages week over week, for 2023 year to date is fairly flat.

This trend differs from the numbers observed in 2020, 2021, and 2022. In those years, while totals were lower, we saw a fairly consistent increase in the average number of outages week over week. Major global events might be responsible for some of this increase in past years. For example, 2020 saw a steep upward trend in outages that can likely be attributed to preparations for sheltering in place as providers adjusted their environments in anticipation of workload shifts. A more detailed analysis of this can be found in the Internet Performance Report.

Average Weekly ISP Outages Trending Down, While Average Weekly CSP Outages Are Trending Up

In 2023, the two largest contributors to the total outage numbers continue to be Internet service provider (ISP) and cloud service provider (CSP) outages. When examining ISPs and CSPs individually, an interesting difference emerges.

ISP-related outages still make up the majority of outages by some margin and total ISP outages were up compared to numbers observed in previous years. However, the average number of ISP outages seen per week was actually trending down for the first six months of 2023. This is the first time in three years that we’ve seen the average weekly ISP outages drop in the first half of the year. During the same period in 2020, 2021, and 2022, ISP numbers all trended up. It will be interesting to see if that trend continues for the rest of 2023.

The 2023 downward trend might indicate improvements in outage containment. Improved outage containment could suggest an infrastructure evolution, perhaps the introduction of newer software-oriented architecture, as well as adoption of a more agile approach to engineering work, similar to approaches observed in the CSPs.

In the past, even when ISPs tried to strategically complete maintenance work outside of business hours, their architecture set them up for a potential domino effect when an outage hit. As a result, the outage radius spread beyond time zones, making the overall blast radius very large and likely increasing the overall number of outages observed. The reduced blast radius we’ve observed in the past few years suggests infrastructure changes may have been made, allowing ISPs to better contain the impact of outages and reducing the average number of ISP outages observed each week.

Graph showing average weekly ISP outages trending down overall in the first several months of 2023.
Figure 1. Average weekly ISP outages trended down overall in the first several months of 2023.

While the average weekly ISP outages are trending down, the average weekly CSP outages appear to be trending up this year as they have in previous years. The total number of CSP outages is also increasing, though it’s still lower than the total number of ISP outages. As previously mentioned, CSP architecture tends to be less prone to the domino effect of outages than ISPs.

The fact that both the total number of CSP outages and the average weekly CSP outages are growing might be linked to the fact that CSPs are becoming more prevalent and play an increasingly active role on the Internet.

Graph showing average weekly CSP outages trending up overall in the first several months of 2023.
Figure 2. Average weekly CSP outages trended up overall in the first several months of 2023.

It should be noted that while the outage patterns observed for both ISP and CSPs over the first several months of 2023 suggest that such outages have less potential to create vast global impacts, a major global outage is still a very real possibility.

Application Outages Increase in 2023

As mentioned, application outages also appear to be increasing in 2023, although on a far smaller scale than network outages. However, since applications rely on a vast web of interconnected dependencies to function, there’s more potential for wide-scale disruption and user impact if any of these services or aggregation points experience a degradation or outage.

This year, many major technology companies have experienced application-related outages or disruptions including Microsoft, Instagram, and Okta.

The January 25 Microsoft outage can be seen on the graph below, contributing to the spike in application-related incidents in late January. For a deeper analysis of the Microsoft outage, read this earlier outage analysis. We’ll also discuss the Instagram and Okta case studies further below.

Graph showing application-related disruptions increasing in the first half of 2023.
Figure 3. Application-related disruptions saw an increase in the first half of 2023.

Instagram Outage

In one 2023 application-related incident, on May 21, Meta’s Instagram experienced a “technical issue” that prevented users from accessing the social media platform. During this incident, users who tried to open the app received an error message that read “Couldn’t load feed.” Refreshing the homepage and profiles didn’t seem to help resolve the issue. The disruption appeared to last about 75 minutes.

During this Instagram outage, ThousandEyes observed 5xx errors, suggesting that the problem was caused by server-side errors and issues such as instability in the server, potentially following multiple resets and/or potential authentication mismatches.

However, the outage was more likely caused by a problem with a single point of aggregation within the application. The network appeared to be functioning, packets were forwarding, but the service was still unavailable for users. The outage’s global impact underscores the widespread disruption application-related outages can cause.

Screenshot from ThousandEyes platform showing Instagram accessibility impacted globally.
Figure 4. Instagram accessibility impacted globally.

Okta Disruption

Another 2023 application-related disruption occurred on March 12 at Okta. In some geographies, users experienced problems accessing their corporate applications through Okta’s single sign-on (SSO) service.

Users could still sign in and access their Okta dashboard, but a subset of the application’s icons that usually displayed on the Okta dashboard didn’t render properly, making it difficult for users to access some of the apps they typically used during their workday.

The problems first showed up as 504 gateway timeout errors in one “cell” (Okta groups its public-facing infrastructure as a series of cells, isolated from one another). Okta resolved these issues after 30 minutes. However, the same cell—and others—then started responding to user authentication requests with 403 forbidden errors.

Okta reported that a bug in their internal tooling protracted the 403 issue, causing network rules implemented as part of a fix to be “incorrectly set to block requests,” manifesting as 403s on the front end. Overall, this 403 problem lasted for an hour, according to Okta’s post-incident report, consistent with ThousandEyes’ observations.

Screenshot from ThousandEyes platform showing
Figure 5. HTTP 403 forbidden errors in response to user authentication requests.

This Okta incident underscores an important lesson application-related disruptions often illustrate: If even one element of a complex service delivery chain fails, the whole application can be rendered unusable. In Okta’s case, the application itself was actually available and accessible, but due to the visual issues with the icons on the Okta dashboard, users still had trouble using Okta as normal.

When working to deliver positive user experiences, it’s vital to consider all dependencies. It’s not enough to focus on a single application. Companies often rely on several interconnected applications to deliver one service. All of these apps—both internal and third-party applications—must be functioning and working together smoothly to provide a positive user experience.

Subscribe to the ThousandEyes Blog

Stay connected with blog updates and outage reports delivered while they're still fresh.

Upgrade your browser to view our website properly.

Please download the latest version of Chrome, Firefox or Microsoft Edge.

More detail