We’re increasingly seeing anycast being used for the delivery of Internet services. It’s been around for a while with the DNS root, but many CDNs and even HTTP-based applications are now making use of anycast. So what is anycast and how does it differ from more common methodologies?
Three Ways of Routing and Addressing IP Packets
Every device on the Internet is identified by an IP address (addressing), which serves as the building block of communication between networks (routing). Based on the characteristics of the communication, IP addressing can be broadly divided into three categories:
- Unicast: One-to-one mapping where data is sent to a single receiver
- Multicast & Broadcast: One-to-many mapping where there can be multiple receivers
- Anycast: One-to-nearest mapping
What is Anycast?
Anycast is an addressing and routing methodology wherein multiple physical endpoints are logically denoted by a single IP address. Yes, this means that there can be multiple distributed systems with the exact same IP address. You are probably wondering how routing works if there are multiple physical destinations but only one logical address. How will the routers know where to send the packets? We will get to that very shortly, but first let’s look at an example that will clarify the concept of anycast.
Anycast in the DNS Root
One of the most well known use case of anycast is the Domain Name System. DNS root servers are hosted as clusters of servers using anycast addressing. When data is destined to an anycast address, routing algorithms determine the “nearest” advertised location and send packets to it. In the following example, we query the F root server from two Cloud Agent locations, Chicago and Paris.
Notice that in both the cases, the F root server resolves to the same IP address 192.5.5.241, but the path taken to reach that IP address is drastically different with absolutely no overlap. The root server responding to the Paris Cloud Agent is hosted within an Equinix data center in France, while the F root server responding to the Chicago Agent is located in North America. This is further corroborated by low link latencies between the respective penultimate hop and the end destination. Another interesting observation is that the anycast IP address of the F root server, 192.5.5.241, doesn’t look any different than a unicast IP address. Unlike broadcast or multicast, it is not possible to identify an anycast IP address by just looking at it.
The example above highlights one of the key advantages of using anycast. Anycast solutions work well for services that are globally distributed providing increased reliability and performance as traffic will always be routed to the “nearest” available node thereby optimizing end-to-end latency. The distributed nature of anycast also makes an anycast network a bit more fault tolerant to node failures and external attacks like DDoS. Now, let’s discuss the routing aspect of anycast.
Anycast Routing in the Internet
When it comes to conventional networking (let’s ignore NAT for a little bit) an IP address is meant to be unique. However, we have just seen that anycast is an exception to that theory and there is no way to distinguish between an anycast and unicast address. While these challenges make it seem like routing to a global anycast address could be complicated, in reality anycast routing is simple as it relies on the de-facto routing protocol of the Internet, Border Gateway Protocol (BGP).
The Internet is organized into a large number of Autonomous Systems (AS) and uses BGP to exchange routing information and establish inter-connectivity between the ASes. An AS represents a group of originating IP prefixes. BGP defines reachability across ASes through a path vector metric called “AS Path” and determines the best path to an IP address by choosing the shortest AS Path.
The same anycast IP address or prefix is advertised from multiple locations and as this route propagates across the Internet, BGP not only enables awareness of the shortest path to the advertised prefix, but also has multiple secondary paths to reach the destination. This enables picking the anycast server relatively “close” to the origin of the data. If any of the anycast servers fail or become unavailable, the routes advertised from that location are withdrawn. BGP then chooses the next most preferable route and traffic simply shifts to an alternate node with the same IP address.
Anycast in CDN's
Content Delivery Networks (CDN) play an important role in modern day networks, propelled by increasing data rates in the Internet along with consumers being less tolerant to slow download speeds. Video and voice applications are particularly sensitive when it comes to jitter and latency. A CDN is a globally distributed network of proxy servers that deliver content to end users with high availability and low latency. The goal of a CDN is to optimize delivery by serving the content from a server that is closest to the end user. This sounds very similar to anycast, where the closest anycast server is picked based on the proximity of the end user. It would seem like every CDN service provider will default to using anycast, but as reality has it, that is not the case.
Changing routes in the Internet combined with multiple possible destinations for the same IP address does not guarantee all packets from the source will reach the same server. Applications that use protocols like HTTP/TCP rely on a connection to be established. If a new anycast node is picked in between communication, then the service can be disrupted. For HTTP applications that have per-flow TCP load-balancing algorithms this condition can be further aggravated. Which is why anycast has been previously recommended for connectionless services like UDP and DNS. However, this is a myth and has been proved wrong many times in the past. Anycast works well for connection oriented protocols as well and should not be dismissed. For a detailed insight on how TCP performs over anycast take a look at this blog post recently published by LinkedIn.
CDN service providers have two different approaches on how they pick the closest cache server to serve content. CDN vendors like Cloudflare,Cachefly, Edgecast prefer the anycast-based routing approach while Akamai, Limelight, Fastly prefer DNS-based routing. DNS-based routing picks the closest cache server by making an informed decision on where the user’s DNS server is located.
Depending on what type of CDN vendor you pick, the network topology can be significantly different. For example, take a look at Figure 3 that represents the path to Zendesk’s network from multiple globally located Cloud Agents. Zendesk is hosted on CloudFlare’s Anycast CDN network which is why the destination is represented by a single IP address, although it can be served from multiple different locations. In such scenarios, the penultimate hop will provide more information on the location from where content is being served. This is very similar to the F-root DNS example we saw above in Figure 2.
If you are using a DNS-based CDN vendor then the network topology is going to be a little bit different. Based on the location of the querying Cloud Agent, DNS will assign the closest edge-server which is why you see different destination endpoints in Figure 4, which represents Nordstrom services hosted by Akamai.
Monitoring Anycast Networks
In theory, anycast networks are simple: Multiple physical servers assigned the same IP address relying on BGP for route propagation. But implementing and designing an anycast framework is complicated, especially fault tolerant anycast networks. Even more challenging is effectively monitoring your anycast network to quickly identify and pinpoint faults. If you are relying on a CDN vendor to serve your content, it is very critical to monitor and validate your CDN vendor performance. While monitoring anycast based CDN’s focus on measuring end-to-end latency and penultimate hop characteristics to understand which data center is serving content. Understanding HTTP server headers is also another way to determine where your data is being served from.
For example, CloudFlare uses a proprietary header called CF-Ray in the HTTP Response messages which includes a hash appended with the datacenter the request came through. In the case of the Zendesk example discussed before, the CF-Ray header for the Seattle region is CF-RAY: 2a21675e65fd2a3d-SEA, while that of Amsterdam is CF-RAY: 2a216896b93a0c71-AMS. You can also use the HTTP X-headers from the HTTP Response to identify which location is serving your content.
Anycast is definitely an interesting networking concept and gaining increased acceptance with new-age CDN providers. If you are interested in learning more about how our customers use ThousandEyes to monitor their Anycast CDNs, take a look at Twitter's experience. Kick start monitoring your anycast networks with a free trial of ThousandEyes.