New Podcast
Managing Traffic During Peak Demand; Plus, Microsoft, Akamai Outages

Outage Analyses

Twitter Outage Analysis: March 28, 2022

By Chris Villemez
| | 14 min read

Summary

On March 28, 2022, Twitter was rendered unreachable for some users around the globe due to a BGP hijack.


For about 45 minutes on March 28, 2022, beginning at approximately 12:07 UTC, Twitter was rendered unreachable for some users when JSC RTComm.RU (AS 8342), a Russian Internet and satellite communications provider, announced one of Twitter’s prefixes (104.244.42.0/24) and subsequently blackholed traffic destined to the social media service.

Figure-1-Twitter-Unavailable-Page-Load-Test.png
Figure 1. Twitter becomes unavailable for 45 minutes beginning at approximately 12:07 UTC

While the global impact of this BGP hijack was limited and most Twitter users outside of Europe experienced no disruption, some users were unable to reach Twitter for nearly 45 minutes until RTComm withdrew the erroneous route. For the impacted users, ThousandEyes monitoring observed timeout failures while attempting to establish a TCP connection to the Twitter service. This resulted in a 100% failure for the impacted users for the full duration of the incident.

Outage Analysis

ThousandEyes caught the entirety of the incident caused when JSC RTComm.RU AS 8342 improperly announced the 104.244.42.0/24 IP prefix owned by Twitter. Since JSC RTComm.RU AS 8342 is not the prefix owner, this is a prefix mis-origination, often referred to as a route hijacking—though it is important to note that the term “hijacking” here is a formal designation that does not imply malicious intent, but simply the origination of an unowned prefix without authorization. In Figure 2, we see JSC RTComm announcing Twitter’s prefix to its BGP peer, MTS PJSC AS 8359, which in turn propagated the route to its peers.

Figure-2-Twitter-RTComm-BGP-Hijack.png
Figure 2. Start of BGP advertisement of 104.244.42.0/24 by RTComm (AS 8342)

Traffic that was destined for Twitter was rerouted for some users and instantly began to fail. Figures 3 and 4 show the sudden and immediate impact to network traffic and Twitter site access. 

Figure-3-Twitter-Outage-Packet-Loss.png
Figure 3. The behavior seen is 100% packet loss for this traffic destined towards twitter.com 

Since Twitter’s service is not located within RTComm’s network, any Twitter traffic destined to RTComm would have failed.

Figure-4-Twitter-Fails-to-Load-BGP-Hijack.png
Figure 4. Twitter site fails to load for users impacted by erroneous BGP announcement

The global impact of the erroneous BGP announcement by RTComm varied, with some ISPs accepting, installing, and propagating the route, while other ISPs continued to use the previous, valid routes. Figure 5 shows a small number of geographically widespread locations unable to reach Twitter. In contrast, the bulk of the world’s Twitter users could still successfully reach the service and showed a healthy “green” status.

Figure-5-Twitter-Outage-Global-Impacts.jpg
Figure 5. March 28, 2022, the Twitter outage affected a small number of geographic locations

In figure 6, we can see the path changed for Toronto, Canada; Leipzig, Germany; and Columbus, OH; resulting in 100% packet loss after changing to route through MTS PJSC AS 8359 to reach Twitter. Other non-impacted locations continued along successful paths.

Figure-6-Twitter-Traffic-Loss-RTComm-Network-Edge.jpg
Figure 6. Some global users are rerouted, leading to traffic loss at RTComm’s network edge, while many other locations are not impacted

Interestingly, some locations recovered before RTComm withdrew the problematic route. For example, the Columbus, OH, location, routing through eNET Inc. AS 10297, stopped using the invalid path and fully recovered well before other impacted locations, who continued to see failures until RTComm withdrew the route. 

Figure-7-Twitter-Columbus-Ohio-Path-Restores.png
Figure 7. Columbus, Ohio, path restores before other locations

After nearly 45 minutes, RTComm finally withdraws the announcement, as seen in figure 8, where the erroneous BGP path advertised by RTComm is now shown as inactive.

Figure-8-Twitter-RTComm-Withdraws-Route-Advertisement.png
Figure 8. RTComm withdraws route advertisement

By approximately 12:51 UTC, impacted users are able to reach Twitter’s service, and it can be assumed at this point that the prefix withdrawal has fully propagated across global Internet routing tables.

Figure-9-Twitter-Service-Restored.png
Figure 9. Access to Twitter’s service is restored for impacted users after RTComm’s BGP announcement is withdrawn

Failed Traffic Manipulation

We know that the March 28th Twitter event was caused by RTComm announcing themselves as the origin for Twitter’s prefix, then withdrawing it. While we don’t know what led to the announcement, it’s important to understand that accidental misconfiguration of BGP is not uncommon, and given the ISP’s withdrawal of the route, it’s likely that RTComm did not intend to cause a globally impacting disruption to Twitter’s service. That said, localized manipulation of BGP has been used by ISPs in certain regions to block traffic based on local access policies. 

If RTComm’s intent was to block Twitter for its Russian users, advertising itself as a termination point for Twitter traffic to local peers in a limited way, if correctly implemented, would have enabled targeted traffic blocking, without impacting global routing tables and causing broad service disruption for users across many regions. 

Whether for censorship purposes or simply an error, RTComm’s BGP manipulation had a broad impact—such is the power of BGP and the potential for something to go wrong on a global scale. To understand how this incident could have happened, it’s important to understand the role of BGP communities and the concept of blackhole routing.

Traffic Diversion and BGP Communities

BGP provides all of the granular controls needed to steer or divert traffic. In this case, if RTComm had intended to redirect traffic destined to this prefix but not necessarily propagate that change to the broader Internet community, how would that be done? Important here is the use of BGP communities, those optional informational labels sent with route advertisements to affect route policies or provide extra information about the route.

A similar traffic diversion concept exists in RFC 7999: BLACKHOLE Community, which deals with the idea of blackhole routing as a mechanism to filter Distributed Denial of Service (DDoS) traffic. In blackhole routing, an IP address or prefix under attack is announced to a local ISP using a special well-known BGP community, which instructs the ISP to discard all traffic to this destination. This protects the destination network resources from being impacted by the flood of DDoS traffic and drops (blackholes) the traffic closer to the source of attack.

There’s another extremely important step that is needed when such a traffic diversion is initiated—engineers should additionally make use of one of these two well-known communities, NO_EXPORT or NO_ADVERTISE to ensure that the “blackhole” route is not propagated beyond the local ISP.

If RTComm neglected to apply the NO_EXPORT or the NO_ADVERTISE community, then peers of RTComm that received the rogue prefix announcement would have in turn sent the route update to their peers, potentially propagating this update across the Internet. We do see that this advertisement did indeed propagate beyond RTComm’s peers.

The question then becomes: why didn’t the entire world experience this same failure?

Thwarting BGP Hijacks

Preparation and detection are the best offense against BGP threats. Organizations have essentially two options to deal with route leaks and hijacks:

  1. The first is proactive—ensuring there is monitoring in place for rapid detection, and safeguarding your BGP with security mechanisms, such as RPKI. 
  2. The second is reactive—one option is prefix de-aggregation by advertising smaller prefixes so that IP specificity would steer traffic away from the hijacker; the second option would be to reach out to other service providers to help uninstall an illegitimate route.

In the Twitter scenario, the advertised IP prefix was already a /24 so Twitter announcing more specific routes would not have worked—most networks will not accept prefixes longer than /24. The best avenue upon detection in this scenario is quickly engaging the ISPs for their help in dropping or filtering the invalid advertisements. It often comes down to needing those that own the routers to fully end such an event.

The Benefits of RPKI

While there are various mechanisms providers can use in BGP configurations to secure themselves, RPKI, a cryptographic security mechanism for performing Route Origin Authorization (ROA), is a good option available to organizations today. This mechanism affirms which ASNs are authorized to originate a particular IP prefix or set of prefixes, and ISPs validating routes through RPKI would reject improper advertisements. 

We can see below that Twitter has adopted RPKI and that its 104.244.42.0/24 prefix is digitally signed and ready for RPKI.

Figure-10-Twitter-Prefix-Ready-for-RPKI.png
Figure 10. Twitter’s /24 prefix is ready for RPKI

Any ISP that is properly utilizing ROA methods to validate received routes would have known to reject the route advertised by JSC RTComm as they are not the Authorized Origin for this prefix. However, not all ISPs have adopted RPKI. In the path shown in figure 11, RPKI was not utilized, and we can see that eNET (AS 10297) accepted the invalid path through MTS PJSC (AS 8359).

Figure-11-Twitter-Invalid-Path-Accepted.png
Figure 11. eNET (AS 10297) accepts the invalid path through MTS PJSC (AS 8359)

Although there has been progress, the Internet still lacks a critical mass of RPKI adoption to completely stop route leaks and hijacks. 

BGP Peer Density

However, even without the protections of RPKI, the considerable peering density of Twitter would have offered significant levels of protection against rogue advertisements since BGP will prefer the shortest AS path and the most specific prefix route to a destination. 

As seen in figure 12, Twitter’s (AS 13414) peers directly with many major ISP and transit providers. This makes for a lot of very short paths to Twitter.

Figure-12-Twitter-ISP-Transit-Provider-Connections.png
Figure 12. Twitter AS 13414 is well-connected with major ISP and transit providers

JSC RTComm, on the other hand, largely peers with local Russian service providers, so they had a smaller pool of peers to send this prefix announcement, compared with the reach and extent of Twitter. Twitter’s peer density likely helped reduce the acceptance of the illegitimate advertisement and, ultimately, the scope of impact.


Lessons and Takeaways

  • BGP hijacks and route leaks are common. BGP hijacks, whether malicious or accidental, are a constant creator of Internet churn and chaos, showing their capability to break digital services suddenly and drastically. 

  • BGP well-known communities exist for a reason. Ensuring adherence to best practices with regards to BGP communities, especially when engaging in blackholing of traffic (e.g., in the case of DDoS mitigation) is critical. 

  • RPKI can increase the security of BGP routes. It cannot be stated firmly enough that RPKI offers significant protection against BGP hijackings and leaks. However, until RPKI is fully adopted, some level of route insecurity will remain.

  • Visibility matters. Rapid detection is key to quickly responding to BGP leaks and hijacks. Understanding your own BGP peering relationships and end-to-end Internet routing is necessary to not just gauge the health and security of your routing but also to ensure that the resiliencies put in place are operating as expected. 

Subscribe to the ThousandEyes Blog

Stay connected with blog updates and outage reports delivered while they're still fresh.

Upgrade your browser to view our website properly.

Please download the latest version of Chrome, Firefox or Microsoft Edge.

More detail