New Podcast
Managing Traffic During Peak Demand; Plus, Microsoft, Akamai Outages

The Internet Report

Insights From Outages at Starlink, Schwab & Internet Archive

By Mike Hicks
| | 13 min read
Internet Report on Apple Podcasts Internet Report on Spotify Internet Report on SoundCloud

Summary

Three recent outages highlight key reminders for NetOps around backup options, the role of intelligence, and understanding your end-to-end service delivery chain.


This is The Internet Report, where we analyze recent outages and trends across the Internet, through the lens of ThousandEyes Internet and Cloud Intelligence. I’ll be here every other week, sharing the latest outage numbers and highlighting a few interesting outages. As always, you can read the full analysis below or tune in to the podcast for firsthand commentary.


Internet Outages & Trends

After an H1 trends edition, we resume with our regular programming this week, exploring three diverse outages that impacted three large brands. 

Some users of Schwab.com and its apps found themselves unable to transact or trade due to an authentication issue; a subset of Starlink users were unable to establish a connection; and the Internet Archive and the Wayback Machine were intermittently overwhelmed by unexpected traffic floods.

The outages raise a variety of important themes for NetOps teams, from the complexity of the end-to-end service delivery chain of today’s digital applications, to the importance of alternative options, and the role that visibility plays in breaking down complexity and in a layered intelligence model.

Regular readers and podcast listeners will no doubt recognize all three as common themes on The Internet Report—and ones we will continue to see arise over and over again.

Read on to learn more about these incidents and recent outage trends, or use the links below to jump to the sections that most interest you:


Starlink Outage

At approximately 1:25 AM (UTC) on Wednesday, May 29, Starlink experienced an outage lasting approximately 45 minutes.

Screenshot of Starlink connectivity issues
Figure 1. Connectivity issues observed for users connecting over Starlink’s network*

ThousandEyes observed network outage conditions for users connecting from the U.S., Europe, and Australia, who were unable to access the Internet through the service. Many users would have experienced the outage as DNS timeouts when attempting to reach sites and services, as the network outage prevented the reachability of public DNS resolvers.

Screenshot of DNS timeouts when accessing Facebook Messenger via Starlink
Figure 2. Starlink users experience DNS timeouts to Facebook’s Messenger service*

ThousandEyes observed no corresponding BGP changes, and the control plane appeared stable.

Starlink acknowledged the “network outage” in a post on X and said it was “actively implementing a solution.” It later said the issue had been resolved, but didn’t share any additional information about the cause.

Given the criticality of Internet services today, it is increasingly common for users to have more than one way to connect to the Internet—whether that backup option is cellular or something else. That may not always be possible in the Starlink context, as the service is often favored by users whose alternatives are few or nonexistent, or where any alternatives that do exist are cost prohibitive or underperforming. Still, for some users with a critical need for bandwidth, the outage may reinforce the importance of maintaining diversity of Internet access—such that if one path were to go down, they would still have an active pathway out to the Internet via a different service provider.

Schwab Outage

Access to online and app-based services at financial services firm Charles Schwab, such as the thinkorswim trading app, were impacted by an authentication issue on June 11. Users reported being unable to login or authenticate to the system. Impacted users would likely have been unable to execute trades or take other desired actions within the platform.

Screenshot of a path visualization to Schwab.com, showing the site was reachable
Figure 3. The Schwab.com domain remained reachable throughout the outage*

The firm acknowledged the issues on its homepage: “Due to a technical issue, some clients may have difficulty logging in to Schwab.com, our mobile app, and StreetSmart platforms.” It later said the issue had been resolved, although it added that “some users may experience residual issues logging in with the Schwab Mobile App.” It offered two potential workarounds here: first, to “disable and reauthenticate your biometric logins or erase and re-type your Login ID and password”—redo your two-factor authentication—and second, to “uninstall and reinstall the mobile app.”

Screenshot of suggested fixes displayed on the login screen
Figure 4. Suggested fixes were displayed on login screen

ThousandEyes’ observations are consistent with the company’s authentication system experiencing issues. Whether or not a change was made to that system is unclear; what is clear is that during the incident, it was possible to load the homepage content, but access to online services that required a login failed at that point.

Screenshot of a transaction test and waterfall unable to complete login
Figure 5. Transaction test unable to complete login during outage*

The disruption lasted about 30 minutes, although ThousandEyes also observed a similar shorter recurrence the following day, which may indicate that the company was making a second effort to complete whatever work was undertaken on the central authentication system, or making an additional change stemming from the main disruption.

The outage underscores the complexity of today’s digital financial services environments and the mesh of services that power these experiences. Each service has interdependence on the other, and a breakage of one service can cause a ripple effect that renders the entire digital experience inaccessible or unusable. Every part of the mesh is important, and end-to-end visibility is essential to the availability and resilience of financial services.

DDoS Attack Impacts Internet Archive

The Internet Archive and the Wayback Machine were targeted with an “intermittent” distributed denial-of-service (DDoS) traffic flood over several days late last month, according to a blog post from the company.

DDoS attacks involve overwhelming a server or network with traffic, causing significant disruptions to normal services. Signs of these attacks include sudden website slowdowns or complete unavailability. 

During the event, ThousandEyes observed characteristics indicating severe traffic loss conditions, consistent with a DDoS attack, likely preventing users from accessing the Internet Archive site.

Screenshot of a path visualization demonstrating packet loss when connecting to Internet Archive
Figure 6. ThousandEyes detected severe traffic loss conditions during the event*

As of 12:25 AM (UTC) on May 28, reachability to the Internet Archive’s site appeared to be restored. 

Screenshot of a path visualization showing normal activity connecting to Internet Archive
Figure 7. Reachability to Internet Archive restored*

The Internet Archive site is hosted out of a single location, meaning that regardless of where users are located, all traffic will be routed to that hosting environment. This type of site architecture is more likely to see network congestion and subsequent packet loss when experiencing high volumes of incoming traffic, as would be seen in an DDoS attack. 

Employing traffic analytics tools can aid in detecting unusual traffic patterns that may indicate the presence of a DDoS attack. As I’ve written before, security is about layering—and that includes layered defenses and sources of intelligence. Network visibility is an augmentative view that, when combined with other signals, can assist in incident diagnosis. 

Network traffic patterns are likely to be known well in advance. Ops teams will know what normal traffic looks like, and they will also have their networks designed to scale up to meet predictable peaks in demand, such as those that may be associated with annual events (e.g. Tax Day) or similar. Where an unusual load is detected outside expectations, NetOps can explore whether it’s genuine service demand or something else. A correlation of signals will help quickly determine what’s going on.


By the Numbers

In addition to the outages highlighted above, let’s close by taking a look at some of the global trends ThousandEyes observed across ISPs, cloud service provider networks, collaboration app networks, and edge networks over the past four weeks (May 20-June 16):

  • After a brief increase in global outages in mid-May, ThousandEyes observed a return to a downward trend at the end of May. From May 20-26, there was an 11% decrease in outages compared to the previous week, with outages dropping from 227 to 202. This trend continued in the following week (May 27-June 2), with the number of outages decreasing by 14%.

  • This downward trend again briefly reversed at the start of June, with outages observed between June 3 and 9 rising from 173 to 197, a 14% increase compared to the previous week. The trend then returned to a downward trajectory the following week (June 10-16), with the numbers decreasing by 6%.

  • The United States experienced a similar pattern: a 25% decrease in outages from May 20-26, followed by a 14% decrease the next week (May 27-June 2). This was then followed by an increase of 14% from June 3-9. However, unlike the global outages, U.S. outages continued increasing the following week (June 10-16), with the numbers rising from 64 to 68, a 6% jump.

  • Only 32% of network outages occurred in the United States during the fortnight that spanned May 20 to June 2. This continues a pattern observed in the previous two fortnights (April 22 - May 19), during which U.S.-centric outages represented less than 40% of all global outages. This is the first time this year that ThousandEyes has observed this trend for three consecutive two-week periods. However, this trend was broken in the following period (June 3-16), when U.S.-centric outages represented 40% of all global outages.

  • Looking at the month-over-month trends, in May, there were a total of 882 outages observed worldwide, marking a 28% increase from the 687 outages reported in April. However, the United States experienced a decrease in outages, dropping from 299 in April to 287 in May, a 4% decrease.

Bar chart showing global and US outage trends
Figure 8. Global and U.S. network outage trends over the past eight weeks.

*All the screenshots and visualizations marked with an asterisk in this blog come from the ThousandEyes platform.

Subscribe to the ThousandEyes Blog

Stay connected with blog updates and outage reports delivered while they're still fresh.

Upgrade your browser to view our website properly.

Please download the latest version of Chrome, Firefox or Microsoft Edge.

More detail