The Internet is central to how modern businesses deliver their products and services to their end users at scale. Yet, the unpredictable and complex nature of the Internet can leave organizations (and their data) vulnerable. An accidental route leak or a malicious DDoS attack can negatively impact service availability and reliability, rattling customer confidence and diminishing their loyalty. In today’s business environment, it is crucial to take a proactive approach to Internet traffic monitoring.
Matthew Wilson, Director of Network Engineering at Neustar, recently spoke at ThousandEyes Connect in New York City to share how his team is providing superior customer experiences while also safeguarding their infrastructure from nefarious Internet activity. As a leading technology service provider, Neustar connects its clients with trusted information to help them make complex decisions roughly 20 billion times a day. In recent years, Neustar launched a mission to become the largest DDoS mitigation platform available, and today it comprises 14 nodes globally with a capacity greater than 11.8Tbps.
A Complex Problem Needs an Intelligent Solution
Providing superior customer experiences is at the heart of Neustar's business. As the number of DDoS attacks has increased sharply in recent years, Neustar realized that it could help its customers deal with this sophisticated type of threat using its unique combination of technology and expertise. Wilson recalls, “Every time some new attack came around, customers would come to us and ask, ‘How are you going to handle this attack,’” and Neustar wanted to have the best answer.
Spurred in large part by this growing customer demand, Neustar built a powerful network that can handle large-scale DDoS attacks and protect its customers’ services from this malicious activity. The complex nature of this network, however, meant that Wilson’s team had to manage traffic through a lot of different mechanisms to deliver an end-user experience that was available in a consistent manner. For instance, to provide DDoS services, traffic is either routed through the Neustar platform (BGP swing), where it is scrubbed and clean traffic is sent to the customer, or the traffic is sent through a DNS redirect. This second method is used for individual sites or things that might be hosted in a cloud infrastructure where you can only have CNAMEs and not individual IPs.
Monitoring a Large-Scale DDoS Mitigation Network
The complexity of this network created a certain level of risk when it comes to ensuring both network availability as well as platform availability. This meant that Neustar needed a way to monitor its infrastructure to make sure each area is not only available but also performing consistently from the customer’s standpoint. Wilson adds, “Whether it’s the proxy servers on the DNS side of things or BGP endpoints where we are doing BGP sessions with our customers, we need to ensure that these things are up and available all the time.”
Monitoring Neustar’s DDoS mitigation network is especially important to ensure reliability. To do this, they deploy a combination of unicast and anycast routes in every node. However, peering issues at the anycast routes can sometimes cause traffic to be routed to undesirable locations. To manage this, they use ThousandEyes Cloud agents as external vantage points to see whether any particular anycast node is up or down and also visualize the path that traffic is taking. “As you use this, you'll start to see individual hops that might be having problems or individual hops that changed a path, as well as the up/down on our side,” adds Wilson.
If a problem occurs, the team is able to receive timely alerts that are integrated into our common alerting platform. From here, the Neustar Security Operations Center (SOC) is able to see the message, diagnose the issues and take corrective actions to resolve the problem. These alerts have helped them improve operational efficiency, reduce response times and—most significantly—protect their customer experience.
Helping Customers Mitigate DDoS Attacks
Not only does the team do this across Neustar’s network, but they also offer a similar DDoS monitoring service to a select number of customers. “We'll put their individual service/applications IPs in and monitor from end to end,” says Wilson. While Neustar may not have direct control over the application itself, they can monitor it to ensure server availability and responsiveness. This can help the team quickly understand whether the system is set up properly and if DDoS mitigation is working effectively. Wilson adds, “While a customer is under mitigation and, on the rare occasion, experiences performance issues, we know very quickly what's going on.”
Even with sophisticated technology in place, DDoS attacks can be difficult to resolve because there is no one-size-fits-all mitigation technique. “There are human beings on the other side of the attack, and they are constantly tweaking and tuning it. So we have to be able to do the same thing,” says Wilson. Visualizing the path helps them to troubleshoot these cases, and they rely on ThousandEyes to look for problems and identify signs that may warrant a specific mitigation technique or a combination of techniques.
Creating an Extremely Happy Customer
Neustar wanted to make sure that they saw the external experience from their customers’ perspectives in order to deliver a positive customer experience. A good example of this is when a customer contacted the SOC about intermittent packet loss that they were experiencing. They knew that they were having problems, but they didn’t have visibility into where it was occurring within the network. Simply put: it was a mystery. Even when pinging the customer’s backend, the Neustar team saw that the traffic appeared to be flowing just fine.
Diving into this issue further, within 15 minutes the Neustar team began using ThousandEyes to start running tests against the customer’s backend. With a clear line of sight into the endpoint paths, they were able to rule out the GRE tunnels as the source of the problem. In addition, HTTP appeared to be going through, and TCP connections were working as intended. The mystery continued.
A breakthrough came when they isolated an upstream link that was intermittently blocking ICMP. As it turns out, the customer had prioritization set to block ICMP when the network got saturated. According to Wilson, “we were able to tell them very quickly (within about 15 minutes) that it was their upstreams getting congested, and those are blocking ICMP traffic. They're not blocking the rest of the traffic, and actually, the service is just fine."
To resolve this issue, Neustar was able to work cooperatively with its customers using ThousandEyes to visualize the problem and share graphs that showed exactly where the problem was. The result? “We created an extremely happy customer out of this,” says Wilson.
Understanding Traffic and Behavior Outside of Its Network
To deliver on its mission of providing customer-centric services, Neustar needed to have visibility into traffic and behavior outside of its network. As Wilson states, this is for a number of reasons, notably:
- To identify inefficient routing. When dealing with a network as large and complex as the one Neustar built, there can be a great deal of complexity in routing, as traffic comes into the network and is sent back out to customers in a variety of ways. So Neustar needed a way to see consistent routing in order to avoid a situation in which traffic out of New York hits a Singapore node—it should be hitting a US node. In addition, because upstream ISP’s often make routing updates and add peers, routes can move around. In cases such as these, inefficient routing can reduce service speeds and negatively impact the customer experience.
- To monitor route withdrawals. Periodically, Neustar will request to withdraw a route because a customer is an on-demand service. Frequently, part of the traffic still comes through when a route is pulled because the BGP reconvergence can take time. This can be especially difficult to detect because while in theory the route has been withdrawn, BGP routing tables take time to refresh and traffic still flows through the previous AS-path. So having external visibility is extremely important for them to truly have visibility to customer traffic.
- To detect malicious BGP threats. Finally, Neustar wanted to provide a mechanism by which they can distinguish malicious BGP hijacks from intentional hijacking done by their own team. This was a particularly acute concern for its financial services customers. Of course, during the normal course of business, they are effectively the ones managing traffic and moving it around, and they needed a way to reassure customers that the traffic patterns are normal. Wilson adds, “We want to know we're doing it and be able to show our customers we're the ones doing it—not somebody else.”
A Network where Everyone Wins
Businesses trust Neustar to keep its service online, to keep their service online, and to give them the best performance possible. Using ThousandEyes has helped enable Neustar to keep this promise to its customers. According to Wilson, “ThousandEyes is one of the more critical pieces that Neustar uses to ensure we are meeting our commitments to our customers and that we're able to provide a good service so that everybody wins and everyone's happy.”