ThousandEyes is part of Cisco   Learn More →
Live Webinar
Raising the Bar on End User Monitoring

Outage Analyses

AWS Outage Analysis: December 15, 2021

By Angelique Medina
| | 3 min read

Summary

On December 15, 2021, AWS experienced an incident that affected the reachability of multiple applications. The following is an analysis of the outage, which will be updated periodically as we have more information to share.


[Dec 15, 12:00 pm PT]

Both today’s incident and last week’s outages involved the AWS network, where traffic loss was preventing reachability of services. In the case of the December 7th outage, the traffic loss caused by congestion was occurring between the border of AWS’ main network and their internal management network. Today’s incident, however, occurred within their main network, where traffic from sources both inside and outside AWS was getting dropped. 

Historically, many incidents with wide impact on the network involve significant impairment of the control plane, preventing network devices from routing traffic correctly or at all. However, in these incidents, that does not appear to be the case, as destinations were still reachable from a routing standpoint, with some traffic successfully reaching the intended destinations.

In the first incident, AWS devices performing critical functions involving traffic forwarding and network address translation (NAT) were overwhelmed. In today’s incident, a large increase in traffic loss similarly suggests that some network functions in the data path, such as routing or NAT, were not able to operate at normal capacity (for an, as yet, unknown reason), preventing full reachability of apps and services.


[Dec 15, 9:35 am PT]

ThousandEyes can further confirm that us-west-2 was also impacted by the reported network issues with many users experiencing packet loss connecting to us-west-2.

Figure-3-AWS-Outage-December-15.png

Figure-4-AWS-Outage-December-15.png


[Dec 15, 9:15 am PT]

ThousandEyes observed packet loss within AWS’ network during the incident, and Amazon has confirmed that network connectivity issues were responsible for the disruption.

Figure-2-AWS-Outage-December-15.png


[Dec 15, 8:45 am PT]

An AWS incident impacting us-west-1 that started at approximately ~7:15am PT affected the reachability of multiple applications like Okta, Workday and Slack. AWS is reporting the issue is now being remediated.

Live link to view the AWS outage in ThousandEyes here. No login required.

Figure-1-AWS-Outage-December-15.png

Subscribe to the ThousandEyes Blog

Stay connected with blog updates and outage reports delivered while they're still fresh.

Upgrade your browser to view our website properly.

Please download the latest version of Chrome, Firefox or Microsoft Edge.

More detail