ThousandEyes actively monitors the reachability and performance of thousands of services and networks across the global Internet, which we use to analyze outages and other incidents. The following analysis of Meta's service disruption on March 5, 2024, is based on our extensive monitoring, as well as ThousandEyes’ global outage detection service, Internet Insights.
Outage Analysis
On March 5, 2024, Meta experienced an unexpected interruption that rendered certain services inaccessible to its users. These services included Facebook, Instagram, Messenger, and Threads. The issue was first observed around 15:00 UTC (7 AM PST), and while the platform appeared to be reachable, many users were unable to proceed beyond the login or authentication process.
The cause of the issue was likely in the backend, as Meta’s systems appeared to be reachable. Network paths connecting to the services showed no significant network conditions that would have contributed to or caused the outage.
There are certain scenarios in which users might encounter problems with an application that could cause the service to appear unresponsive even though it is still accessible. In such an instance, users could be presented with a basic landing page that doesn't provide the expected functionality; instead, users might encounter various error messages or incomplete content. These error messages can have a combination of different conditions and characteristics that are not specific to a particular issue. In this case, it appeared that some Facebook users received login rejections, while some Instagram users were unable to refresh their feeds. Both cases appear to be related to authentication problems.
Authentication is a crucial step in accessing a service, and it is considered a fundamental dependency for the functional performance of the service. A failure at this step can impact the entire application delivery chain, causing major disruptions for users.
At around 15:17 UTC (7:17 AM PST), Meta confirmed that it was experiencing issues with its login services. The issue was likely caused by a failure in one of the dependencies that the login system relies on.
ThousandEyes observed a gradual recovery of impacted Meta services. At approximately 16:50 UTC (8:50 AM PST), it appeared that the services had been restored for some users. ThousandEyes then observed a gradual recovery for more users. Meta Communications Director Andy Stone reported at 17:19 UTC (9:19 AM PST) that the issue had been resolved. ThousandEyes was able to confirm that by 18:40 UTC (10:40 AM PST) the majority of regions were able to connect. Finally, at 19:27 UTC (11:27 AM PST), Meta made the official announcement that the issue was fully resolved.
Lessons Learned
In today's digital era, network and digital service providers are required to make frequent changes to their systems—for routine maintenance, security updates, business growth, or a number of other reasons. However, making changes, no matter how innocuous they might seem, to a digital system is not without risks. This can mean that even the most robust system can be vulnerable to disruptions and errors, and it only takes one element within the service delivery chain to fail for the complete functional performance of a service to be impacted.
It is crucial to have a complete view of your entire digital delivery chain in order to identify any decrease in performance or functionality. Such visibility enables IT teams to be notified of faults in a timely manner and also to determine the exact location of the fault. With this information, the responsible party can be identified, allowing for a quick resolution of the issue or an alternate workaround process to be executed. Not only does this improve the quality of service, but it also assists in establishing realistic expectations and implementing processes for reducing the impact of current and future issues.
[Mar 5, 2024, 11:30 AM PT]
ThousandEyes observed Meta services gradually recover, with many users able to successfully access the application by approximately 16:50 UTC (8:50 AM PST). By 18:40 UTC (10:40 AM PST), the incident appeared to be resolved.
[Mar 5, 2024, 08:20 AM PT]
On March 5th, starting at approximately 15:00 UTC (7 AM PST), Meta services, including Facebook, Instagram, and others experienced a disruption preventing users from accessing those apps. ThousandEyes can confirm that Meta’s web servers remain reachable, with network paths clear and web servers responding to users. However, users attempting to log in are receiving error messages, suggesting a backend service, such as authentication, as the cause of the issue. The incident is still ongoing as of 16:20 UTC (8:20 AM PST).