At ThousandEyes we deal with huge volumes of data collected from our agents across the Internet. A large number of our tests are configured to target public facing web sites owned by our customers. The performance of each site depends heavily on the provider hosting the site and how well it is connected to the Internet, especially for sites with a global footprint of users. To understand how top online businesses go about hosting their principal domains, we looked at the top 5,000 sites with more traffic in the United States. At the basic level, the questions we wanted to answer were:
- How prevalent is self-hosting?
- What are the most popular cloud providers?
- How many sites are fronted by DDoS protection services?
Each domain in the list maps to a Autonomous System Number (ASN) that represents the organization physically hosting that domain. For example, pinterest.com resolves to the IP address 220.127.116.11. If we look at routing tables for announced address blocks that cover this IP address, we find it falls under address block 18.104.22.168/15 announced by ASN 14618 belonging to Amazon.
$whois -h whois.cymru.com " -v 22.214.171.124 " AS | IP | BGP Prefix | CC | Registry | Allocated | AS Name 14618 | 126.96.36.199| 188.8.131.52/15 | US | arin | 2011-09-19 | AMAZON-AES - Amazon.com, Inc.
For each domain, we followed these steps:
- Resolve the domain to an IP address; if the top level domain can’t be resolved, we try to prepend “www.”
- Map the IP address to a BGP (Border Gateway Protocol) prefix using longest prefix matching on global routing tables; map the the respective ASN, as well as the name of the organization
- Based on the domain/provider combination, classify the domain into one of three classes:
- self-hosted: if the ASN belongs to the same organization that owns the domain
- cloud-hosted: if the ASN belongs to a cloud/hosting provider
- sec-hosted: if the ASN belongs to a DDoS mitigation service
Self-hosting vs Cloud
From the initial set of 5,000 domains, we weren’t able to map about 488 sites (~9%), and these unmapped sites had rankings uniformly distributed across the entire 5k population. The graph below shows the breakdown of type of hosting for the 4,512 mapped sites.
In the cloud-hosted category, we included IaaS providers like Amazon, more traditional hosting providers like GoDaddy, CDNs like Akamai and ISPs that also provide hosting e.g. Qwest.
The self-hosted category typically includes large corporations (large enough to run BGP;)) with multiple address blocks and multiple data centers e.g. bankofamerica.com, apple.com, etc.
The sec-hosted category includes domains that are fronted by a DDoS mitigation service such as Prolexic. We only measured domains where the mitigation service was working at the DNS level, so this number is a lower bound since it does not include cases where the mitigation service works at the BGP level.
Because of our methodology, it’s also possible that some of the domains we classified as cloud-hosted domains are actually self-hosted, since they can be smaller address blocks that are advertised by the ISP, so our number for self-hosted is a lower bound.
If we just look at cloud-hosted domains, the breakdown of providers is the following:
The usual suspects are leading the charge (Amazon, Rackspace) hosting together more than 20% of the domains, but there’s a surprisingly long tail of hosting companies having more than 60% of the domains. A related measurement from "Next Stop, the Cloud: Understanding Modern Web Service Deployment in EC2 and Azure" indicated that only 4% of Alexa top 1M sites were using Amazon. In our measurements, we are only taking into account the principal domain, i.e. it’s possible that companies use self-hosting for their main site, and use Amazon EC2 for devtest and other use cases.
More than 21% of the top 5k US domains are self-hosted, and that’s biased towards sites with large geographic footprint. Amazon is the leading cloud provider in the US, but it only hosts 12.4% of the top sites that are cloud-hosted. There’s a surprising long tail of small hosting providers that host the vast majority of the sites. In the Silicon Valley, various startups kick off with Amazon and Rackspace as their hosting providers resulting in a perception that Amazon and Rackspace together host a majority of cloud hosted sites. This perception however is not substantiated by our measurements; in fact we see that Amazon only hosts about 9.6% of the top US sites (including self-hosted in the baseline). The web hosting market is still very fragmented, with 61.7% of the cloud-based sites using a large number of smaller hosting providers.