This is The Internet Report, where we uncover what’s working and what’s breaking on the Internet—and why. If you’re an AWS customer or rely on services that use AWS, you might have noticed the major, hours-long outage last week. On November 25th, at approximately 5:15 am PST, users of Kinesis, a real-time processor of streaming data, began to experience service interruptions. The issue was not network-related, and AWS later issued a detailed incident post-mortem analysis identifying an existing operating system configuration issue that was triggered by a maintenance event involving added server capacity. Over the course of the day, Amazon attempted several mitigation measures, but the outage was not completely resolved until approximately 10:23 pm PST.
What was notable about this outage was its blast radius, which extended far beyond AWS’s direct customers. Several AWS services that use Kinesis, including Cognito and CloudWatch, were affected, as were any user of applications consuming those services (e.g., Ring, iRobot, Adobe). This is a good reminder of the risk of hidden service dependencies, as well as the need for visibility to understand and communicate with customers when something’s gone wrong. Catch this week’s episode to hear about this outage.
Find us on:
Finally, don’t forget to leave a comment here or on Twitter, tagging @ThousandEyes and using the hashtag #TheInternetReport.