ThousandEyes actively monitors the reachability and performance of thousands of services and networks across the global Internet, which we use to analyze outages and other incidents. The following analysis is based on our extensive monitoring, as well as ThousandEyes’ global outage detection service, Internet Insights. See how the outage unfolded in this analysis.
Outage Analysis
Updated July 30, 2025
On July 24, Starlink experienced a 2.5-hour global outage starting around 19:13 UTC. Users worldwide lost Internet access as their terminals failed to connect to a satellite, and instead began a cycle of reconnection attempts. The outage affected users across the globe within every geo where service is available, including North America, Europe, and Australia. To understand what caused such a widespread disruption, we examined the evidence from ThousandEyes’ extensive Internet monitoring data set.
What Happened During the Starlink Outage?
ThousandEyes monitoring observed unusual failure patterns during the Starlink outage, which suggested a system-wide issue rather than failures originating out of specific locations within Starlink’s network.
Starlink’s network of satellites is referred to as a constellation. This constellation is centrally coordinated and controlled via a software-defined control plane. User terminals are constantly handed off between clusters of satellites within this constellation, a process managed by the control plane, which is also responsible for real-time traffic engineering, seamless handoffs, and load balancing across thousands of moving nodes. When this control plane fails, it can systemically compromise the constellation's ability to route traffic, creating a single point of failure with global impact—which appears to be what happened during the July 24 outage.
The failure of Starlink’s control plane during the incident manifested in a number of ways:
- Inability to Connect to a Satellite: Globally, a large number of terminals were unable to establish any connection at all, remaining in a continuous state of searching for connection, indicating a failure to associate with a satellite.
- Backbone Routing Failures: Other terminals successfully connected to Starlink's ground station infrastructure via satellite, but traffic could not route beyond it, suggesting issues within Starlink’s backbone network, a critical component of its service. This pattern indicated that while the physical satellite link may have been active at certain points, the network's ability to forward traffic within its own network was compromised.
- End-to-end Traffic Instability: At certain points during the incident, some user terminals appeared to successfully connect to a satellite and forward traffic to a destination via Starlink's network. However, these paths were highly unstable, exhibiting significant packet loss before failing completely. This suggested the data plane—the infrastructure that forwards user traffic—was sporadically functional but unreliable.
Explore this outage further in the ThousandEyes platform (no login required).
Deducing the Cause: Hardware vs. Control Plane
The key to diagnosing the Starlink issue lies in comparing the outage characteristics against the known operational principles of a LEO constellation.
A satellite hardware failure, for instance, would be governed by the constellation's physical mechanics. Because each Starlink satellite completes an orbit every 91-95 minutes, such a failure would manifest as a rolling disruption where service would degrade and recover across regions in cycles matching this orbital period—as faulty satellites passed overhead and were replaced by functional ones.
Instead, ThousandEyes observed:
- A simultaneous global failure affecting all regions at once
- A service breakdown lasting 2.5 hours, far exceeding the orbital period
- No correlation between the outage timing and predictable satellite orbital paths
This evidence effectively rules out distributed hardware issues. Rather, the observed pattern—a sudden, global, and largely persistent failure of the network to direct traffic—is a typical signature of control plane-related disruptions. Control plane issues can trigger a variety of traffic behaviors, including erratic failure patterns and disruptions in different parts of a network.
The recovery pattern also pointed to control plane issues. As service was restored, ThousandEyes observed a staggered, non-uniform recovery where routing paths were re-established intermittently, and terminals reconnected gradually, not following any clear regional pattern as a hardware issue might have. This behavior is consistent with a complex control plane re-initializing and re-establishing stable routing states across a dynamic, moving topology of thousands of satellites.
Insights From Official Statements
ThousandEyes’ findings align with Starlink's own public statements. The Vice President of Engineering reported that “the outage was due to failure of key internal software services that operate the core network.” Subsequently, an industry report noted that Starlink owner SpaceX informed resellers that the issue stemmed from an “upgrade procedure” involving software rollout to Starlink's “ground-based compute clusters,” which host the constellation's control plane.
What Can NetOps Teams Learn From the Starlink Outage?
Stalink’s inter-constellation communication is a closed system that, unlike autonomous networks on the ground, is not designed to directly connect or interoperate with other providers. Traffic must ultimately be routed through Starlink’s IP network on the ground, before it can be handed to another network, such as a service provider or app/cloud provider. In effect, the constellation can become untethered from the Internet. This architecture has implications for enterprises seeking to incorporate LEO satellite connectivity into their network architecture, highlighting important considerations for connectivity planning and risk management. Network IT operators should keep the following considerations in mind:
- Design for Transport Diversity: Any network is subject to failure, even at global scale. True resilience requires transport diversity—combining satellite with fiber, cellular, or other connectivity types that are not subject to the same control plane.
- Plan for Service-specific Failures: Traditional business continuity plans often focus on site-specific disasters (e.g., a fire or power outage), where geographic diversity offers protection. This incident highlights the need to plan for service-specific global failures, where all your locations could be impacted by a single incident. An updated plan should identify critical service dependencies and establish clear procedures for operating in a degraded or offline state.
Previous Updates
[July 24, 2025, 3:30 PM PDT]
ThousandEyes data indicates that Starlink began experiencing a widespread global outage around 19:15 UTC. Service began to recover at around 21:31 UTC, with most locations seeing recovery by about 21:40 UTC. As of 21:44 UTC, the incident appeared to be fully resolved. The duration of the outage was approximately 2.5 hours.
Given the global scope of the incident and the rapid onset and recovery across different locations, the issue is likely due to a software or configuration issue.
Explore the outage in the ThousandEyes platform (no login required).
[July 24, 2025, 2 PM PDT]
ThousandEyes data indicates that Starlink is experiencing a widespread global outage that began around 19:15 UTC and is still ongoing.
