Delivering Reliable Digital Experience to Clients Worldwide
Modern financial institutions have fairly sophisticated technology stacks dedicated to monitoring and optimizing internal networks, and for years, these were sufficient for their business needs. Today, however, clients expect to be able to access their financials anytime, anywhere. So visibility within the boundaries of corporate networks is no longer sufficient for ensuring digital experience delivery.
While businesses move to the cloud en masse, many financial institutions still rely on their tried-and-true data centers to deliver their critical services. These data centers, which are often distributed across geographic regions, are networked via multiple ISPs. Due to the criticality of these ISPs in connecting the data centers, it is incredibly important for them to provide reliable connectivity and availability, lest they impact employee and/or client experience.
Limited Visibility Outside of the Corporate Network
Despite critical reliance on ISPs, most firms lack the ability to see beyond their corporate perimeter to identify when an ISP-related issue is occurring—and that can lead to ambiguity and disrupted user experience when something goes wrong.
At this leading global financial services firm, the Network Instrumentation Engineering team is responsible for the monitoring, alerting, metrics and configuration management tools that support the network group within the enterprise. This team helps to ensure that the network group has the monitoring it needs to ensure smooth operations, and that their vendors, including ISPs, are held accountable to their SLAs.
Outage Exposes the Need for Visibility
Limited visibility into ISP availability and performance became problematic during one such incident. During an outage within one of its ISPs, the firm’s network team spent more than 8 hours trying to triage the situation and understand where the problem really was. “We really had no visibility into anything outside of our walls, so it made it difficult to be able to find out where the problem really lied.” In light of this incident, the firm came to the realization that they needed visibility into external networks to be able to reduce the mean time to recovery for an outage like that.
Monitoring Connectivity to Data Centers
In order to provide continuous service to its clients, employees and physical branches, having the ability to monitor connectivity to its data centers from an external vantage point is extremely important—and something it recognized it lacked. “We simply didn’t have that kind of visibility for outside networks that we don't control.” Using ThousandEyes Cloud Agents, this financial services firm is able to monitor the reachability of its major data centers in large metropolitan areas globally—such as New York, London and Tokyo—to ensure employees and clients are able to access its services without disruption.
Gaining Visibility into DDoS Mitigation Effectiveness
Increased visibility into connectivity and routing at its data centers also gives this financial institution better insight into the performance of its DDoS mitigation vendor. “Previously, we didn't have any visibility into when our DDoS mitigation service was engaged, or how successful it was when active.” Using ThousandEyes, the firm can see when traffic is routed to their ASN and when a switchover takes place. This helps it verify that mitigation is working as expected and that the vendor is meeting its SLAs.
Figure 1: The broader impact of the outage is seen through data from a set of relevant carriers
Monitoring the Internet for Outage Threats
While monitoring the reachability of its data centers provides the firm with alerts should service provider availability or performance become an issue, there is a lot happening across the global Internet outside of its four walls that influences what goes on within. With ThousandEyes Internet Insights, the team is able to see outages across the Internet and correlate them with their own incidents to understand the scope of an event.
Understanding Outage Scale and Impact
One example of when this came into play was during a recent Cloudflare disruption, which was caused by a service provider routing error. The team happened to be monitoring a fixed income circuit with ThousandEyes when it received an alert reporting packet loss (Figure 2). As they looked at the macro view of that test with Internet Insights, they were able to see that it was part of a much broader issue. “Our team saw that it wasn't just this circuit that was having issues, it was much larger. The route leak affecting Cloudflare was creating issues across the country. It was affecting Verizon, Level 3 and Google.”
This insight helped them to understand the scope and scale of an issue that impacted their clients. “When you start to hear reports of an outage, you're pretty much in the dark. But if you can tie it to another issue that's happening and its cascading impact across different service providers, it gives you an indication of what's wrong. Then you can communicate that out to your stakeholders and even your customers and employees.”
In another situation, a client of the firm was experiencing connectivity issues, yet there was nothing on their systems that indicated a problem. Using Internet Insights, the team was able to go back and match the times that they reported with outages seen in Internet Insights, which happened to coincide with a Verizon outage in the New York metro area. They were able to take that data and escalate it to Verizon and be able to address this client's issue. According to the team, “We'd never been able to have that kind of visibility or be able to corroborate what the client was saying to an ISP before—and from the perspective of the client, they see us as being proactive in helping them resolve an issue.”
The team also uses Internet Insights to understand the state of the Internet before the Stock Markets open each day. “There's a list of things we check, and one of them is the Internet Insights, to make sure everything looks clear.” In addition, the firm plans to use the visual dashboards available in Internet Insights for its next-generation NOC, where the subject matter experts (SMEs) of all the different functional groups, from security to web operations, sit together to triage situations that may have an impact across silos.
Figure 2: Internet Insights captures significant packet loss throughout the edges of the Verizon backbone (AS 701)
New Awareness with Internet Insights
ThousandEyes Internet Insights provides this financial services firm with a level of Internet awareness they never had previously. “Internet Insights allows us to be able to understand the scope of an issue—because with all the metrics and the alerting that goes on, without Internet Insights, you don't have an idea of whether it is only impacting you or if there is a broader scope. Having collective intelligence provides that, because now I'm able to understand the bigger picture issue that's underpinning the problem I may have experienced or maybe didn't experience.”
Specifically, the visibility provided through Internet Insights enables the firm to:
-
Accelerate Outage Troubleshooting
Internet Insights helps reduce the mean time to identify (MTTI) issues. Once an issue is identified, Internet Insights provides the information needed to successfully escalate to the service provider. “In one case, where a client was having issues with a major broadband provider, that typically that would have led to hours of troubleshooting. We would not have known to call the broadband provider, and even if we did, it would have been a finger-pointing game to get it resolved. With Internet Insights, finding and getting evidence of the issue in the broadband provider’s network took us minutes.” -
Better Manage What They Cannot Control
Between cloud providers, SaaS applications and other third-party services, the firm relies on a growing list of Internet-connected services that they do not own or directly manage. “When it comes to the Internet, you don't have any sense that you can have control. You're not ever going to have direct control but knowing how these services behave and knowing how to isolate when there is an issue is extremely important.” Internet Insights gives them the visibility to understand what they do not control. -
Leverage the Credibility of Collective Intelligence
Raising issues to ISPs without clear and compelling evidence is a challenge for most organizations, even those with the clout of a major enterprise. Internet Insights provides outage impact and scope—leveraging the collective visibility of many enterprises to provide ISPs with authoritative information. “This data is irrefutable. From the ISP perspective, I’d think they can now look at it, and if it's something systemic, they can address it or at least they can recover their services quickly and provide better service overall.”
Figure 3: Verizon nodes (AS 701) that are reporting forwarding loss
Conclusion
Delivering services over the Internet relies on an increasingly complex ecosystem of external, third- party providers. While there are many aspects of this ecosystem that businesses cannot fully control, having an awareness of outage events, big and small, across the global Internet can help you understand the environment you're in and respond with precision when an outage does affect you. “If you're providing services today and you rely on external parties to enable you to provide those services, you need Internet Insights.” Leveraging the macro and global views of Internet Insights, combined with the views of their own service-delivery paths, this financial institution has visibility into outage events that they never had before.
Figure 4: Impacted networks due to loss in Verizon (AS 701) stemming from the route leak
Start Monitoring Your Network