The Many Reasons for Traffic Engineering
Traffic engineering is a crucial responsibility for every network engineer and operator, playing a vital role in maintaining efficient and reliable network performance. While the frequency of this task can vary based on the company or specific role, it remains a common and essential activity due to the constantly evolving conditions within our networks and across the Internet.
One of the primary objectives of traffic engineering is to enhance performance. Network engineers constantly strive to optimize routing to ensure traffic traverses the most efficient paths. This relentless pursuit of performance is not just a technical endeavor but a commitment to improving customer experience, regardless of the industry.
Sometimes, however, traffic engineering is driven by non-performance-related factors. For instance, a peering team might request changes to comply with contractual obligations or to push traffic to a more affordable networking path using alternative transit providers.
Regardless of experience, the task at hand is a challenging one. There are plenty of operational challenges that network operators need to take into consideration. Many different operational or architecture-related scenarios can negatively affect the outcomes of your traffic engineering attempts. Changes in topologies, one-off scenarios, and unexpected configuration changes come to mind. After all, many significant outages happened due to traffic engineering going awry.
Despite our expertise with the BGP Best Path Selection Algorithm, its decision-making process, and extensive experience with AS_PATH prepending, BGP communities and local pref manipulations, unexpected events occur. Changes outside our administrative control or within our environment can unexpectedly disrupt traffic engineering efforts.
When such events occur, we learn valuable lessons through processes like root cause analysis and by answering the "5 whys." Often, these lessons lead to updates in our Method of Procedure (MOP) documents to include additional verification steps.
Given the intricate nature of traffic engineering, network engineers and operators must approach it cautiously. This involves thorough verification of routing and forwarding tables, continuous monitoring of critical metrics through dashboards, and the use of third-party tools for validation.
Driving Operational Excellence With BGP Traffic Engineering
At ThousandEyes, we pride ourselves on providing high-quality signals to network professionals. Our near real-time BGP monitoring and alerting capabilities aid operations, allowing engineers to validate traffic engineering changes promptly. We provide comprehensive monitoring capabilities for both ingress and egress traffic, offering insights from both BGP control plane and data plane perspectives to assure no aspect of your network goes unmonitored.
In the following example, we are going to create a test to demonstrate how networking professionals can utilize ThousandEyes to verify the effects of their BGP traffic engineering. All of this is done within a single platform, eliminating the need to search the Internet for “looking glasses” that actually work and return results in a reasonable time. This practicality saves you time and effort, allowing you to focus on what's important. On top of that, ThousandEyes not only provides near real-time feedback from the control plane, but also from the data plane—the same data plane where your customer and production traffic are being routed over.
As shown in Figure 1 below, agent-to-agent tests show bidirectional path visualization between the "te-research-00" agent in AS 210312 and an agent deployed in Oracle's Cloud in Frankfurt, AS 31898. Agent-to-Agent tests are invaluable for visualizing both forwarding and reverse paths. Given the Internet's asymmetric nature, visibility into the reverse path makes a significant difference and enhances the ability to perform efficient root cause analysis.
TCP traffic used for ThousandEyes testing is routed over the same data plane as your production or customer traffic. Therefore, any ThousandEyes-observed events, such as spikes in latency or packet loss, likely affected your production or customer traffic as well.
BGP Route Visualization shows prefix propagation from the perspective of hundreds of BGP monitors deployed worldwide. BGP Route Visualization visualizes metrics such as Reachability, Path Changes, and Updates.
Continuing with our example, you can see that ThousandEyes proactively detected that the agent "te-research-00" has an IP address that is part of the 193.5.19.0/24 prefix and initiated monitoring of relevant BGP metrics, as shown in Figure 2.
Visualizing Ingress Traffic Engineering
Following our previous example, the network operations team performed traffic engineering on ingress to affect traffic entering the network. There are multiple ways to perform it, but the most commonly used methods involve prefix aggregation/deaggregation, AS_PATH prepending, and BGP communities provided by transit providers.
As you can see from the timeline in Figure 3, on the 24th of July at 05:07 CST, our network operations team proactively performed a traffic engineering change on ingress traffic using BGP communities. Our goal was to remove AS 25091 from the path, and we successfully rerouted the traffic through AS 34549. Withdrawals are visualized by striped red lines, while solid red lines represent the path that traffic took post ingress traffic engineering.
While in the “BGP Path Changes” view, ThousandEyes enables users to check detailed timestamps by navigating to one of the BGP monitors on the left-hand side and selecting the option “View details of path changes.”
As shown in Figure 4, at 05:07:19 CST, BGP Monitor England-68 observed a path change that no longer included AS 25091. Instead, the path now included AS 34549.
If we navigate to “Agent-to-Agent” view, we can see that IP addresses in the reverse direction changed, but grouping based on network indicates that AS 25091 was completely removed from the path, as shown in Figure 5 and Figure 6.
With near real-time BGP monitoring and alerting, we can confidently verify the effects of our traffic engineering changes. ThousandEyes allows us to do this directly, both from a control plane perspective using BGP Route Visualization and from the data plane using Path Visualization.
Visualizing Egress Traffic Engineering
ThousandEyes has always had the capability to demonstrate the impacts of egress traffic engineering. An often-used strategy in traffic engineering involves adjusting local preference. Unlike some other BGP attributes, local preference is not transitive, meaning it is not shared with other peers and cannot be seen in eBGP feeds. As a result, we must depend on our data plane visibility and insights in such scenarios.
As shown in Figure 7 below, by navigating to the “Agent to Agent” view and examining the Path Visualization at 07:34 CST on July 24, 2024, we can see how traffic was routed on the data plane, just one minute before the network operations team decided to alter the path.
Examining the data plane path on the timeline is of utmost importance as it provides a clear understanding of the traffic flow before, during, and after any changes were implemented, as shown in Figure 8.
One minute later, at 07:35 CST, the network operations team applied egress traffic engineering, significantly changing the path. As a result, traffic was routed from the origin AS 210312 to AS 8298 (transit) before it reached Oracle's AS 31898, as shown on Figure 9 and Figure 10.
Why does it matter?
We all aspire to reach operational excellence. However, in increasingly complex environments, achieving operational excellence can feel daunting. The risks are high. Adverse traffic engineering outcomes frequently result in outages and route leaks and, more often than not, incur both reputational and monetary loss for organizations.
For far too long, the network engineering community has been burdened by their reliance on suboptimal tools. These tools, such as “looking glasses” scattered across the Internet and solutions using control plane data, have often left us feeling uncertain about the effectiveness of our traffic engineering and the health of our networks.
ThousandEyes closes that gap. With near real-time BGP monitoring and alerting, we provide you with unprecedented visibility when it comes to the propagation of your prefixes from the perspective of hundreds of strategically deployed monitors all across the globe. Today, while you check how things look from the routing table perspective, ThousandEyes provides visibility on the effects of your traffic engineering from multiple vantage points across the globe, almost instantaneously. You are checking the effects from hundreds of looking glasses at once, only more reliable, faster, and better looking.
And we don’t stop there. Using Path Visualization, we not only show you the effects of your egress traffic engineering, as visible from the example above, but we do that from the data plane’s perspective. The same data plane that your production and customer traffic is being routed over. And in this case, we visualize the effects both in the forwarding and reverse direction.
How many times have you called your peering partner to execute MTR on the reverse path, only to find out that the issue is there? How long does it take? We have all been there, and we collectively deserve better.
With our recent product improvements, including near real-time BGP monitoring and alerting, along with all the benefits of Path Visualization and the high-quality signal that ThousandEyes is renowned for, we've finally got it (and it’s much better).