Happy Halloween week! Today we’re bringing you the first in a series of spooky stories of Internet and IT operations gone awry. To start us off, we’re revisiting the horror that was the Dyn outage two years ago.
On October 21, 2016, Internet users around the globe were cut off from some of the most widely used digital brands, including PayPal, Spotify, and GitHub. Redditors were Reddit-less. PagerDuty (ironically, meant to alert on outages) became unavailable. Crocheters everywhere mourned the loss of Etsy. Even (horror!) Twitter became unreachable.
The cause of this digital death was not some simultaneous application failure or public cloud outage. It was due to an often overlooked aspect of Internet connectivity—the DNS. Major digital brands, often distinguished by their technical sophistication, had failed to implement redundancy for this critical service. All the affected brands were using Dyn as the sole provider of their authoritative DNS records—records that map their domain to an IP address and which users must have before they can connect to a website using a domain name.
When Dyn came under attack throughout that day via a massive, IoT-leveraged flood of botnet attack traffic, its service went down—at one point, for multiple (hair-raising) hours. The geographic scope and scale of the attack were unprecedented, with the number of malicious endpoints estimated to be more than 100,000. And the attack kept coming...and coming...and coming, stretching its impact to the far corners of the Internet.
Dyn worked diligently to restore service (and was eventually successful), but DNS-based attacks are particularly challenging to thwart, as malicious traffic can be difficult to distinguish from legitimate traffic. In fact, during DDoS attacks, the “friendly fire” of a large number of legitimate DNS resolvers repeatedly trying to refresh their caches can have a compounding effect. From the authoritative resolvers standpoint, both traffic sources (malicious and legitimate) are overwhelming its systems and preventing a return to normal service state.
Two years have passed since the Dyn outage, but not much has changed. A recent study conducted by ThousandEyes found that two-thirds of Fortune 50 enterprises and 48% of SaaS providers rely on a single source to serve their authoritative DNS records. Pretty spooky.
What lessons should be learned from the Dyn outage?
- 1. The Internet is fragile.
It’s made up of many different networks and systems, with varying degrees of interdependence. Issues experienced by one network or system can have a cascading impact on other parts of the internet. Providers multiple degrees away from you (e.g. your provider’s provider’s provider) can have an impact on your availability and performance. The surface area of the Internet is also increasing, due to the expansion of Internet-connected devices (IoT), which means more devices, more connectivity, more interdependence. - 2. Web infrastructure redundancy is not enough.
Most enterprises and SaaS providers understand the necessity of infrastructure redundancy and planning for failure; however, most organizations don’t apply this same practice to their DNS. Often, DNS infrastructure is a single point of failure, with an organization either hosting their DNS internally or using a single external DNS provider. In either scenario, if the DNS nameservers become unavailable, it doesn’t matter that a web application or service is functioning—from a user’s standpoint, it will be unavailable. - 3. Architect your DNS for resilience.
Your DNS should not be a single point of failure for users connecting to your site. Implementing a primary/secondary DNS deployment is relatively straightforward and cost-effective. Companies like Sumo Logic, who had the foresight to have a secondary DNS provider already set up prior to the Dyn attack, were able to remain available to their users. - 4. Use data to drive provider decisions and accountability.
Depending on where your users are located, you may have more or fewer choices of high-performing providers. Before you sign on with a provider (or providers), be sure you know what to expect in terms of performance. Monitor your DNS availability, integrity and performance and use that data to ensure that your providers are delivering for you. - 5. Don’t overlook DNS in your planning, operations, and troubleshooting.
When trying to get to the root cause for poor application performance, whether it’s an app you’re consuming or delivering, be sure to include DNS in your checklist of items to look at. It just may be the difference between good versus poor performance.
The Dyn outage may have brought many parts of the Internet to a halt, but it doesn’t have to happen to you. Avoid the horror. Dig deeper into our DNS research by downloading our 2018 Global DNS Performance Report. And don’t forget to subscribe to our blog so you can get tomorrow’s spooky tale of Internet woe.