This week we announced a new feature to visualize reverse network paths and collect more granular metrics in each direction of the flow. This is especially useful in portions of the network with asymmetric routing. So what is asymmetric routing and how common is it?
Why A to B is not the same as B to A
Packets don’t always flow over a network symmetrically. Many a times you would have noticed that the path from client to server is significantly different than the one from server to client. In network infrastructure, this is commonly known as asymmetric routing. Lack of symmetry in networks is by design and not a byproduct of misbehaving or misconfigured networks. When it comes to network monitoring, the emphasis has traditionally been on forward path. The reverse path is conveniently forgotten, and while it might not be the path any less traveled, it has certainly become the path less visible. In today’s post we take a deeper look into why bidirectional paths exist and why it is critical to monitor your network from opposing vantage points that give you granular and complete visibility.
Routing between two end-points is about finding the “best” path, where “best” is relative. And the best way to go from A to B is most likely not the best way to get from B to A. This can be dependent on a number of factors like type of traffic, relative capacity of connected links/interfaces, load-balancing algorithms on ECMP links or different peering policies between your ISP providers. In a lot of cases business relationships between ISPs can also reflect in different inbound or outbound path. It is also possible that routers have algorithms that can change the path between two end points over time, quite similar to Google Maps auto redirection feature that selects alternate routes based on traffic conditions.
Reverse Path Monitoring: Why does it matter?
ISP Monitoring
If packets can traverse different ISPs in each direction, monitoring only the forward direction can mask an ISP failure in the opposite direction, or vice versa. Let’s look at an example to understand this further. Figure 2 is a Path Visualization snapshot showing the bidirectional path between a Cloud Agent in UK and an Enterprise Agent in San Francisco. As you can see, the forward and reverse path between the agents are completely different. The direction of the arrows between the hops indicate the direction of traffic flow. If we dig a little deeper into the path traversed, the forward path from UK to San Francisco transits ISPs Mythic Beasts and NTT America while the reverse path goes through Telia and Cogent. There is 50% packet loss in the forward path node within the NTT network, while there is no loss in the reverse path. If this scenario were reversed and Telia had a 50% loss instead of NTT, it would have been a troubleshooting nightmare with only forward path visibility. Bidirectional visibility can provide additional transparency into any ISP network and help pinpoint issues faster.
Let’s take a look at another example where a link in the reverse path exhibits high latency. Figure 3 shows the path between an Enterprise and a Cloud Agent. One of the links in the reverse path is experiencing high latency of 197ms. In such scenarios where the forward path does not have any visible issues but the reverse path does, traceroute tools disappoint as they are geared towards garnering hop-by-hop metrics and network performance in only one direction. With time-sensitive applications like voice and video, it is very critical to be able to accurately troubleshoot where an issue is.
Detect Varying Path MTU
Consider another scenario where low path MTU on the reverse path results in dropped packets, but the forward path shows everything to be fine. We recently ran into a web application that would break in seemingly random ways. While HTTP request messages were reaching the server, the response messages from the server were taking a different path and larger packets were getting dropped due to an undersized MTU link in the reverse path.
Typically, when you encounter a link with an undersized MTU, the node with the low MTU interface will fragment the packet; however, in this particular case the HTTPS server was setting the ‘Don’t Fragment (DF)’ bit in the IP packet. The smaller packets made it through but the larger packets were dropped. An ICMP error message was sent to the server; but unfortunately, these messages were not reaching the HTTPS server because a firewall in the path was blocking ICMP messages. The HTTPS server was completely oblivious to what was going on in the network, the net result was poor end user experience. On the network, Murphy’s law holds strong.
With bidirectional path visibility, the asymmetric return path with low MTU would have been identified immediately. We recreated the above scenario in our labs, where there were no undersized MTU links in the forward and reverse direction. We then introduced a link with low MTU in the reverse path and subsequently blocked ICMP messages. Take a look at Figure 4 that shows how the performance of a simulated HTTP server plummets.
Bidirectional path visibility can provide insight into the minimum path MTU in both the forward and reverse direction. In our simulated setup, we reduced the path MTU of an interface in the reverse path to 1000 bytes while sending active probing messages at 1460 bytes. Notice how in Figure 5 the low MTU link is recognized in the reverse direction, while the forward path shows a different minimum path MTU.
Superior Visibility into the Network
Asymmetric routing opens up a lot of challenges from a monitoring and troubleshooting perspective. Consider a WAN network that has 5 intermediate routers between a branch office and datacenter with 4 ECMP links between each hop. That’s 512 unique paths that have to be monitored. Troubleshooting services over these links can be a nightmare, especially when the forward path is not the same as the reverse path and when paths can vary based on the application and traffic type. Troubleshooting performance between two locations is not only time consuming but also inconclusive without full transparency. When your business relies on a network that is out of your control, being able to pinpoint exactly where and in which direction an issue occurs is extremely critical.
Understanding bidirectional routes within your WAN network or between your data center/branch offices to applications hosted in the cloud can be very useful to evaluate network performance and reduce mean time to resolution. Bidirectional visibility also provides insights into traffic patterns and trends which can be used to optimize network cost and design.
Agent-to-Agent Tests
Our new Agent-to-Agent tests enhance network visibility by providing additional insight into bidirectional routes, like many of the Path Visualizations that you’ve seen above. In addition, network metrics like latency, jitter, loss and throughput can be viewed either per direction or combined, to help narrow down where a fault is occurring. Try out Agent-to-Agent tests by signing up up for a free trial of ThousandEyes.