There was quite a bit of chaos on the Internet today, including major fiber cuts in California. To add to this confusion, between 5:24pm and around 6:10pm Pacific on June 30th, social media and outage reports indicated some issues with Amazon, AWS and a variety of services that run on AWS. In our office, we realized HipChat (our internal messaging system) and Okta (our SSO provider) were not working. And neither was our corporate website, which is hosted on AWS EC2 and fronted by AWS CloudFront.
Given the known fiber cuts, there was some speculation on the Amazon issues being related to that, so we decided to take a quick look and do our own investigation while this disruption was going on. We received internal alerts that many customers were being impacted by loss in two specific networks. After a few minutes of intense analysis, we found that the root cause of this was not related to the fiber cuts, but in fact a route leak from Axcelx (AS33083), a data center provider in Boston. All of Amazon’s prefixes originating in AS14618 were affected to some degree.
Figure 1 shows routes under normal conditions from our cloud agents in Dallas and New York to Tinder hosted in Amazon’s data center. Expected ISPs consistently seen in the path to Amazon during normal operations are Level 3 and Zayo.
During the outage though, as seen in Figures 2 and 3, the network view shows loss at Hibernia and Axcelx, two networks that were never in the path before; definitely suspicious.
So we looked at the BGP data to see if there was any change in the control plane and not surprisingly as seen in Figure 4, saw significant activity on BGP and the appearance of Hibernia (AS5580) and Axcelx (AS33083) in the BGP paths all of a sudden.
The forwarding loss combined with the sudden appearance of these two ASNs in the BGP paths strongly suggested a BGP route leak by Axcelx. Looking at the raw BGP data showed the exact BGP updates that resulted in this leak.
TIME: 07/01/15 00:24:49 TYPE: BGP4MP/MESSAGE/Update FROM: 220.127.116.11 AS45896 TO: 18.104.22.168 AS6447 ORIGIN: IGP ASPATH: 45896 5580 33083 33083 33083 33083 7224 16509 14618 NEXT_HOP: 22.214.171.124 ANNOUNCE 126.96.36.199/15 188.8.131.52/15 184.108.40.206/16 220.127.116.11/16 18.104.22.168/17 22.214.171.124/17 126.96.36.199/15 188.8.131.52/14 184.108.40.206/14 220.127.116.11/15 18.104.22.168/16 22.214.171.124/16 126.96.36.199/16 188.8.131.52/16 184.108.40.206/15 220.127.116.11/17 18.104.22.168/14 22.214.171.124/16 126.96.36.199/14 188.8.131.52/14 184.108.40.206/15 220.127.116.11/15 18.104.22.168/15 22.214.171.124/15 126.96.36.199/15 188.8.131.52/16 184.108.40.206/15 220.127.116.11/15 18.104.22.168/16 22.214.171.124/16 126.96.36.199/16 188.8.131.52/15 184.108.40.206/15 220.127.116.11/15 18.104.22.168/18 22.214.171.124/18 126.96.36.199/17 188.8.131.52/16 184.108.40.206/21 220.127.116.11/15 18.104.22.168/18 22.214.171.124/19 126.96.36.199/17 188.8.131.52/16 184.108.40.206/18 220.127.116.11/18 18.104.22.168/17 22.214.171.124/16 126.96.36.199/17 188.8.131.52/17 184.108.40.206/16 220.127.116.11/19 18.104.22.168/19 22.214.171.124/17 126.96.36.199/16 188.8.131.52/18 184.108.40.206/19 220.127.116.11/21 18.104.22.168/21
To interact with this data before, during and after the outage, check out these interactive links:
- https://ytffi.share.thousandeyes.com for disruption to Netflix and
- https://zqmybl.share.thousandeyes.com for disruption to Tinder. The “jump to” will enable you see the event from different lens, HTTP, network, BGP etc.
- https://gqnhtvik.share.thousandeyes.com for distruption to Amazon’s ecommerce site
All in all, the route leak affected a wide range of services including consumer internet sites like Yelp, Netflix and Match; SaaS services such as HipChat and Jobvite; and financial firms such as Experian and Zions Bank.