In previous posts, we've explored many of the pieces that make up China's vast system of censorship, including the Great Firewall, a huge content-filtering machine, and the Great Cannon, a powerful new DDoS weapon. But that's just one side of the story — just as Chinese censorship has evolved over time, so too have the methods to evade it.
In an ironic twist of events, Fang Binxing, known as the father of the Great Firewall, was blocked from visiting a South Korean website during one of his talks. He was forced to use a Virtual Private Network (VPN) to access the forbidden content. But Fang isn't the only one “jumping the wall”: about 1 to 3% of China's 688 million Internet users regularly use circumvention methods to evade censorship.
Common anti-censorship tools used to fly under the radar include Tor, proxy servers, VPN services and even code words. However, the Great Firewall has continued to innovate to detect and block these circumvention methods, and as a result the two sides have evolved in tandem in what is essentially a digital war over access to information.
Proxy-Based Circumvention Methods
Most circumvention tools, including proxy servers, Tor, VPN and SSH, combine two mechanisms to bypass the Great Firewall: the encryption of traffic and the use of proxy nodes. These techniques are generally combined because using any one alone — either by using an encrypted tunnel like HTTPS or a simple open proxy like HTTP or SOCKS — is not enough to circumvent China's sophisticated system of censorship.
Proxy Servers
A few proxy services popular in China include Freegate, Ultrasurf and Psiphon, which are free and intended for non-technical users. These services use a number of proxy servers outside China to handle web requests, encrypting all HTTP traffic in Secure Sockets Layer (SSL) tunnels to and from these proxy servers. Thus the content of all packets, in addition to their ultimate destination, are obscured from censoring eyes. Using proxy servers generally costs nothing to install and operate, but speeds, relative to VPNs, are slow.
Tor
Tor's struggle against the censors perhaps best epitomizes the ongoing arms race between circumvention tools and the Great Firewall. Tor, short for “The Onion Router,” is a well-known service that uses re-routing through a series of proxy servers to achieve anonymity. Though Tor was originally designed as a low-latency anonymity network, over time it also earned a reputation for being a useful circumvention tool against Internet censorship.
To create a private network pathway with Tor, the client incrementally builds a circuit of encrypted connections through proxy nodes called relays on the Tor network, evading censorship by using a secret bridge relay to enter the network. The circuit is extended one hop at a time, and each relay along the way knows only the two nodes that are one hop before and after it. No individual relay ever knows the complete path that a packet has taken. To ensure that connections can't be traced as they pass through, a separate set of encryption keys is used for each hop along the circuit, except for the last hop to the destination server. Traffic travels over at least three relays in the Tor network before continuing on to the destination server. In short, Tor uses encryption to obscure the content of Internet communication and evade censorship, and also preserves anonymity by bouncing requests off several anonymous servers around the world.
Blocking Relays and Bridges
The Great Firewall's first documented attempt to block Tor occurred in 2008, when Tor's website was blocked using keyword filtering and spoofed TCP resets. A year later in 2009, the Great Firewall found Tor's Achilles' heel: a centralized directory server where users get the list of relays. The censors simply downloaded this list and blocked the IP addresses of each of the relays in the Tor directory — as a result, Tor lost most of its Chinese users.
But the arms race continued: after the Great Firewall blocked all of the public relays, Tor began reserving a portion of new relays as secret “bridges” that are not published in the directory. Users can use bridges to connect to the Tor network if the “main entrance” is blocked by censorship. Bridges are carefully distributed only a few at a time, through rate-limited out-of-band channels like email and HTTPS. This way, anyone can learn a few bridge addresses, while it is still difficult to learn all of them. However, even with Tor's use of secret relays, censors can take advantage of the distinctive way in which Tor uses TLS, inspecting for the “tells” that distinguish Tor from other forms of TLS.
Before 2011, the Great Firewall had only used simple techniques—IP blacklisting and keyword filtering—in its efforts to block Tor. However, in October 2011, the Great Firewall became drastically more flexible and sophisticated, beginning to detect and dynamically block private bridge nodes within minutes.
Deep packet inspection (DPI) boxes, likely located in China's border autonomous systems (ASes), inspect egress traffic leaving China, including traffic from Chinese users attempting to establish connections to proxy nodes like Tor bridges and relays. The DPI boxes look for suspicious traffic like packets embedded in TLS streams or containing Tor-like characteristics like cipher lists. Suspicious traffic initiates active probing from scanners using possibly hijacked Chinese IP addresses, most of which are seemingly random and rarely reused. The scanners act like legitimate users, attempting to connect to the suspected server with a number of different protocols, including VPN, SSH and various protocols associated with Tor. If any of the connection attempts succeed, the proxy node, identified by its IP address and associated port, is blocked within China. It remains blocked as long as Chinese scanners can continuously connect to the proxy; if they can't, the block is lifted.
In sum, the Great Firewall has blocked Tor on at least three layers:
- Website: Tor's website is blocked using keyword filtering and injected TCP resets, as we explored in a previous post.
- Public Tor network: Almost all of the public Tor relays are blocked on the IP layer so that the SYN/ACK segment sent by the relay to the client is dropped.
- Bridges: Bridges are blocked in the same way as relays, where the SYN/ACK reply never reaches the client.
Further Layers of Encryption
In 2012, Tor fought back by releasing its first obfuscated protocol: obfs2, a “pluggable transport” that wraps the entire Tor TLS stream in another layer that obscures traffic traveling between the client and bridge. Obfs2 re-encrypts the entire stream with a separate key so that the entire communication looks like a benign, uniformly random byte stream in both directions. Though obfs2 had immediate success, it also had a fatal flaw: censors can detect it passively and with high confidence. Because obfs2 first sends a key and then sends ciphertext encrypted with that key, a censor can simply read the first few bytes of every TCP connection, treat them as a key, and use the key to attempt to decrypt the bytes that follow. If the decryption is meaningful (for example, if it's a TLS handshake), then the censor has detected obfs2 and can terminate the connection.
The next obfuscated protocol, obfs3, was designed to address this crucial flaw. It uses a method called the Diffie-Hellman key exchange to determine the keys to be used for encryption. When obfs3 is used, a censor must either use a man-in-the-middle attack to learn the secret keys or use heuristic detection (likely combined with active probing).
The Great Firewall has begun active probing for both obfs2 and obfs3, and has succeeded in defeating obfs2. However, researchers have shown that in practice, obfs3 is almost always reachable for clients in China. Obfs3 is now Tor's most popular transport, while the use of obfs2 is waning, and the number of Chinese users of Tor has increased as the Chinese government tightens control over the Internet.
VPN and SSH
Virtual Private Network (VPN) and Secure Shell (SSH) services are considered the most powerful and stable tools for bypassing censorship. They work in a similar way to proxy servers, but VPN and SSH depend on a private host or an account outside China, instead of open, free proxies, to create a private encrypted channel that the Great Firewall can't see into. This channel connects users to an Internet server outside China so that browsing and downloading requests are sent to a foreign server that finds and sends back the responses to those requests. Because these services rely on private hosts, most of them are not free.
Popular VPN services' websites are blocked at the IP address and domain levels, and all domain names containing “vpn” are also blocked. In addition, the active probing mechanism used against Tor has also been deployed against SSH and VPN protocols. In late 2012, the Great Firewall also began using another mechanism to block VPNs — by learning to identify encrypted VPN traffic and killing those connections. The Great Firewall likely uses a machine-learning algorithm to analyze Internet traffic and spot features unique to VPN traffic, like high numbers of connections to IP addresses outside China. This game of cat and mouse has continued in recent years, as China's censorship system continues to grow in sophistication and reach.
Content Delivery Networks
As more and more online content is cached by content delivery networks (CDNs), proposals for anti-censorship methods leveraging this trend have also emerged. Research has recently explored a novel approach to circumvention: using content delivery networks (CDNs) to deliver banned content. Researchers designed a browser plugin that is able to unblock CDN-hosted content in China, which represents about 80% of the most popular, blocked sites.
When a Chinese user requests a CDN-hosted site, the plugin routes the request directly to the CDN hosting the relevant content without calling on a DNS server. Thus traffic is immune to any DNS tampering, the main technique that the Great Firewall uses to censor CDN content. In addition, HTTPS must be used to encrypt connections and evade any keyword filtering. On the IP level, Chinese authorities are unlikely to block the CDNs themselves. Blocking the IP addresses of CDN edge servers would block access not only to specifically forbidden content, but also to every other site hosted on that CDN. The risk of collateral damage is too great, and this is the belief that the plugin is founded on.
Because the approach of retrieving blocked content directly from content publishers makes no use of third-party proxies, the plugin is able to access content with a download latency much lower than traditional proxy-based circumvention services like Tor.
Code Words
The use of code words and Internet slang has blossomed since the Great Firewall began filtering content based on specific keywords. Chinese netizens especially make use of homophones to evade detection, including using the phrase cǎo ní mǎ, which translates to “grass mud horse,” in place of cāo nǐ mā, a vulgar insult directed at the target's mother that has long been banned by the Great Firewall. As it goes with all funny things on the Internet, the “grass mud horse” has inspired a number of Internet memes since its coining.
Another interesting example is the use of the word “harmonious.” The Chinese government usually cites the goal of constructing a “harmonious society” as the primary reason for censorship. When the word “censorship” was itself censored, Chinese netizens began using the word “harmonious,” hé xié, as a sarcastic euphemism for censorship. And when “harmonious” began to be censored, netizens again switched words, this time to a homophone of “harmonious”: hé xiè, which translates to “river crab.” Since then, the story of the grass mud horse's struggle against the evil river crab has spread far and wide across the Chinese online community.
The Continuing Arms Race
The arms race between China's system of censorship and opposing circumvention tools has raged for years, and has only accelerated since Xi Jinping became president in November 2012. In the coming years, Chinese authorities will have to make difficult decisions about the balance between information control and economic credibility and growth. The Great Firewall and anti-censorship services will also need to innovate as the terrain of the Internet evolves toward the adoption of new technologies like cloud computing, IPv6 and CDNs. Who will use these trends to their advantage first?