As we discussed in our first post about benchmarking network performance in China, half of the battle of monitoring applications in China is setting new expectations for what performance should normally look like; often metrics like packet loss and latency will be significantly higher than outside the Great Firewall.
In this post, we’ll talk about the other half — common issues that we’ve seen network teams run into when they begin the process of monitoring and optimizing application delivery to users and customers in China. These issues are specific to the unique nature of how networks are architected and run in China, and include highly unreliable DNS, blocked page components and monitoring and alerting on the performance of ISPs and third party providers.
DNS Records
With application delivery in China, many performance issues arise from DNS problems related to the frequent packet loss and DNS tampering and hijacking that comes with operating within the Great Firewall. In many situations, we have seen the Great Firewall filter out DNS UDP packets seemingly at random, and under such difficult and volatile circumstances it is crucial to monitor the DNS records important to you at all times.
In general, our Cloud Agents run their own BIND servers, but the Cloud Agents located in China use local ISP caches to minimize the possibility that their DNS requests are blocked. As a result, Cloud Agents prove to be ideal vantage points to understand the experience of Chinese users accessing your services, whether hosted inside or outside China.
Use DNS Server and Trace tests to periodically and systematically check whether important DNS records are both available and have the correct mappings. These checks are particularly important in China, because DNS issues due to congestion, unreliability and censorship are all too common. Also use alerts on DNS tests to be notified about errors, inaccurate mappings and performance impacts in the DNS, Network and Routing layers.
To read more about the various DNS issues that operators may encounter in China, check out our post on monitoring DNS in China.
Page Components
It’s common knowledge that domains like google.com and facebook.com are blocked and inaccessible from within China, but the repercussions from such censorship often have greater impacts than businesses might expect.
If we look at a typical object-level waterfall for the homepage of a US-based site, it’s full of objects like Google fonts and Facebook javascript files. These objects are also blocked by the Great Firewall, and will greatly impact the performance and total page load time of a webpage. Websites intended for users located in China will need to be sanitized of blocked objects and constantly monitored for performance impacts, as China’s censorship policies are volatile and can change from day to day. In China, it is entirely possible that a website works on one day and breaks the next.
Below, see the waterfall that resulted from the Chengdu Cloud Agent loading up the Starbucks US site, https://www.starbucks.com. This is a site clearly not meant for Chinese users — many objects related to Google APIs, Google Analytics, Google Ads and Facebook have been blocked and are unable to load, causing long wait times and major delays for the page to load.
Websites will need to be customized for delivery in China — watch out for issues with third party objects related to Google fonts, Google APIs, Google Analytics, Google Ads, Google Doubleclick, Facebook, Adobe Typekit, Marketo and other marketing automation tools. Keep in mind that censors don’t just look at domains and providers — for example, if an image is named facebookimage.png, it may also have trouble getting through Chinese networks.
In contrast, the waterfall from the same Chengdu Cloud Agent loading up the Starbucks China site, https://www.starbucks.com.cn, shows a different picture. There are fewer objects on the page, and much fewer of them fail to load. In addition, most of the objects are served up by China Telecom. Interestingly, there are still a few Google Analytics objects on the page, and these see especially long DNS times, which is very likely related to blocks on Google.
Use Page Load and Transaction tests to ensure that your webpages are loading as expected from within China. With a rapidly changing censorship regime, it’s crucial to ensure that your objects continue to load and don’t delay application delivery.
Third Party Providers
Page Load, HTTP Server and Network tests can also prove useful in benchmarking and monitoring third party providers. For example, you can employ a methodology similar to the one we used to benchmark network performance in China in order to identify the best-performing content delivery networks (CDNs). Choosing a high-performing CDN to partner with is a crucial part of application delivery strategy for a foreign company looking to enter China, as local CDNs have the knowledge to get around China’s networks and quickly deliver content to Chinese users. Popular CDNs include ChinaCache, CDNetworks and ChinaNetCenter.
Choosing a local data center provider is also critical to success, and you can similarly use a combination of ThousandEyes tests to benchmark performance across providers. Due to the difficulties in acquiring the ICP license required to host sites on servers in China, many companies choose to circumvent this process by partnering with local data center and colocation providers who already have ICP licenses. This space is dominated by the “Big Three”: China Telecom, China Unicom and China Mobile, though global carrier-neutral providers like Equinix and CenturyLink have made inroads with the help of local partnerships.
Alerting
To avoid getting inundated with notifications, you’ll need to adapt your alerts to the idiosyncrasies of application delivery in China.
First, you’ll need to set up separate alerts for agents located in China, as your expectations for metrics like packet loss, latency and response time should all be very different from expectations for the same metrics in the US, for instance. As you set up Network, DNS, Web and Routing layer alerts, restrict the scope of the alert to China agents only, and finetune the thresholds as you baseline performance and learn what to expect. For more information, read our blog post on alerting by geography, network and device.
Given the unique nature of networks in China, you can also consider separating alerts out by ISP. Because China’s Internet is so dominated by China Telecom and China Unicom, and because these two ISPs don’t have many peering points with each other, it can be useful to set up alerts for issues in each ISP so you can even more efficiently ascertain whether problems are ISP-specific.
To set up alerting based on network layer issues in ISPs, use the Path Trace alert. Scope the alert to the set of agents located within China, and set the alert conditions so that you are notified, for example, when high delay is observed in ASNs associated with a given ISP (in the below example, these are ASNs associated with China Unicom).
With all of the above alerts in place, you’ll have a systematic approach to comprehensively detect issues that arise with your applications: you’ll be able to detect when network, application and routing layer metrics are out of line with your expectations (calibrated to China’s uniquely volatile networks), and you’ll also know whether issues are specific to a single ISP.
Monitoring Points
Periodic tests and comprehensive alerting are the first half of detecting changes in performance, but the second, equally important half is to monitor from a variety of vantage points in order to see the entire picture of user experience in China. Because the ISP landscape in China is dominated by a tiny number of region-specific providers, it’s especially important to ensure that these providers are well represented by the monitoring points you choose.
Our newly expanded set of Cloud Agents in China represents both a diverse set of geographic locations and the two prevailing Chinese ISPs, China Telecom and China Unicom. To give you an understanding of the ISP coverage of our monitoring points in China, the below table shows the percentage of routes from each Cloud Agent that passes through the given ISP.
In general, China Unicom dominates northern China, while China Telecom provides service to the southern portion of the country. The fact that the vast majority of our Cloud Agents see 100% of routes passing through either China Unicom or China Telecom provides further evidence that networks in China are highly siloed and region-specific.
Much of the information around operating networks in China is not readily accessible, and many foreign companies who decide to enter the market often rely on a trial-and-error process when it comes to delivering applications to Chinese users. While operating in China is immensely difficult, our 14 Cloud Agents will give you an objective, outside-in perspective that will allow you to evaluate service providers and take action when issues arise.