The potential for a network outage exists in any Internet-connected system. Outages are fundamentally unpredictable and can strike anytime, anywhere to disrupt connectivity. Plus, an outage doesn't have to be a direct hit to be impactful. For example, an upstream provider could experience an outage, and you might feel the ripple effect.
Outages can be disruptive to business continuity and directly impact user experience. For example, when customers visit your website or application, and it's unavailable, with no update to your status page, most will give up and look elsewhere. That is why you need to take precautions to minimize any potential outage and identify its root cause before it is trending on social media.
Planning before an outage can help you ensure a quick recovery, reducing your costly downtime considerably. But to prepare for outages effectively, you need to know the common warning signs of each type of outage. Several factors can cause network outages, including server failures, power outages and natural disasters. DNS, CDN and public cloud outages, such as this AWS outage last December, can also have an impact. And sometimes, all it takes is a misconfigured server at your local Internet Service Provider (ISP). So, it would help if you created an incident response plan that outlines the actions you will take for all these potential connectivity issues before, during and after an outage.
Another crucial component of outage planning is knowing whom to involve and in what capacity. Use a responsibility assignment matrix for best results, also known as a RACI matrix. It describes the various roles who should be involved in completing a deliverable or business process. The acronym RACI is derived from the four key responsibilities most typically used: responsible, accountable, consulted and informed. A RACI matrix clarifies roles and responsibilities, ensuring that nothing falls through the cracks—an essential when getting your systems back online.
As with any network outage, the cause can be the result of any of the interdependent and interconnected networks or services, making outage root cause analysis (RCA) tricky. These causes can be as simple as a severed fiber optic cable or as sophisticated as a distributed denial-of-service (DDoS) attack. To help you narrow down outages cause effectively, use our Internet Outage Survival Cheat Sheet (the infographic is embedded below). This cheat sheet outlines outage symptoms, key stakeholders and actions to take that will help your network team save the day.