Last month I presented Dissecting Significant Outages from 2014 to several hundred networking experts at UKNOF30 in London. The talk highlighted ways to find insights from active monitoring in order to diagnose and mitigate network outages and threats. We covered the Craigslist DNS Hijack, Indosat BGP Hijack/Leak, Country Financial BGP Prepending and HSBC America DDoS Mitigation.
Yahoo! Email Outage in Europe
In addition, I presented an event that resonated with everyone in the audience: the slow performance of Yahoo! mail services across Europe that occurred in November 2014. Yahoo and its related Sky and British Telecom email services were severely impacted by a cable cut in the Irish Sea. The issue affected most European countries and was completely resolved only after 11 days! You can follow along with the timeline of this event with this interactive data set.
On November 20th at 06:45 UTC ThousandEyes observed a significant increase in the latency to reach the Yahoo data center located in Dublin, Yahoo!’s primary European data center at the time. Yahoo, Sky and British Telecom email servers started being very slow or not available at all. During the first 3 hours, availability of Yahoo!’s main site fell to below 25%, while response time (time to first byte) exceed 1 second, double the normal time.
Latency from locations across Ireland, the UK and Continental Europe went from an average of 35 milliseconds to more than 110 milliseconds.
Under normal conditions traffic from Europe and the UK flowed to Yahoo!’s data center in Dublin (“ir” in the hostname) through an undersea cable from the UK to Ireland and one from Amsterdam to Ireland. Latencies on the UK to Ireland cable averaged 20-25ms.
Finding the Culprit
Between 6:00 and 6:15 UTC on November 20th, a Cogent cable repair ship in the Irish Sea accidentally cut a key submarine cable as it was trying to fix another damaged cable, causing widespread internet outages in Ireland. Latencies on the UK-Ireland cable rose from 25ms to more than 90ms. End-to-end packet loss, however, only shows a little spike peaking at 4% around the time of the outage.
Within 30 minutes of the fiber cable cut, traffic was already being rerouted to a US data center in Lockport, New York, near Buffalo (hence the “bf” hostname). The trip across the Atlantic, back and forth, is why we saw such a dramatic increase in latencies from Europe (typically 80ms roundtrip from New York to London). In the Path Visualization view, we can see that the upstream node to the data center is marked in red due to the packet loss. This was probably due to the unexpected amount of traffic coming from Europe that still needed to be load-balanced. As a result, US users also started having poor performances in accessing their emails.
The performance degradation and rerouting of traffic across the Atlantic lasted for more than 2 weeks, prompting many Yahoo! email users to switch to other email providers. In addition, widespread complaints were seen on Twitter and various tech forums. Subsequent to the outage, Yahoo! has begun using additional data centers in Europe to serve email traffic, including one near Geneva.