How to read our China usage graphs
Recently somebody asked me why our usage numbers in China are so low. More precisely, his question was "How do I read this graph in any way other than 'Tor is effectively blocked in China'?" After writing up an answer for him, I realized I should share it with the rest of the Tor community too.
The correct interpretation of the graph is "obfs3 bridges have not been deployed enough to keep up with the demand in China". So it isn't that Tor is blocked — it's that we haven't done much of a deployment for obfs3 bridges or ScrambleSuit bridges, which are the latest steps in the arms race.
The short explanation is that the old vanilla SSL Tor transport doesn't work in China anymore due to their active probing infrastructure. The obfs2 transport doesn't work anymore either, for the same reason. The obfs3 transport works great for now, and thousands of people are happily using it — and some of those people aren't reflected in the graphs you see (I'll explain that more below).
The medium-length explanation is that we've been leading and coordinating the international research effort at understanding how to design and analyze transports that resist both DPI and active probing, and approximately none of these approaches have been deployed at a large scale yet. So it doesn't make sense to say that Tor is blocked in China, because it mischaracterizes Tor as a static protocol. "Tor" when it comes to censorship circumvention is a toolbox of options — some of them work in China, some don't. The ones that work (i.e. that should resist both DPI and active probing) haven't been rolled out very widely, in large part because we have funders who care about the research side but we have nobody who funds the operations, deployment, or scale-up side.
The long explanation is that it comes down to three issues:
First, there are some technical steps we haven't finished deploying in terms of collecting statistics about users of bridges + pluggable transports. The reason is that the server side of the pluggable transport needs to inform the Tor bridge what country the user was from, so the Tor bridge can include that in its (aggregated, anonymized) stats that it publishes to the metrics portal. We've now built most of the pieces, but most of the deployed bridges aren't running the new code yet. So the older bridges that are reporting their user statistics aren't seeing very many users from China, while the bridges that *aren't* reporting their user statistics, which are the ones that offer the newer pluggable transports, aren't well-represented in the graph. We have some nice volunteers looking into what fraction of deployed obfs3 bridges don't have this new 'extended ORPort' feature. But (and you might notice the trend here) we don't have any funders currently who care about counting bridge users in China.
Second, we need to get more addresses. One approach is to get them from volunteers who sign up their computer as a bridge. That provides great sustainability in terms of community involvement (we did a similar push for obfs2 bridges back when Iran was messing with SSL, and got enough to make a real difference at the time), but one address per volunteer doesn't scale very well. The intuition is that the primary resource that relays volunteer is bandwidth, whereas the primary resource that bridges volunteer is their address — and while bandwidth is an ongoing contribution, once your IP address gets blocked then your contribution has ended, at least for the country that blocked it, or until you get another address via DHCP, etc. The more scalable approaches to getting bridge addresses involve coordinating with ISPs and network operators, and/or designs like Flashproxy to make it really easy for users to sign up their address. I describe these ideas more in "approach four" and "approach five" of the Strategies for getting more bridge addresses blog post. But broad deployment of those approaches is again an operational thing, and we don't have any funded projects currently for doing it.
Third, we need ways of letting people learn about bridges and use them without getting them noticed. We used to think the arms race here was "how do you give out addresses such that the good guys can learn a few while the bad guys can't learn all of them", a la the bridges.torproject.org question. But it's increasingly clear that scanning resistance will be the name of the game in China: your transport has to not only blend in with many other flows (to drive up the number of scans they have to launch), but also when they connect to that endpoint and speak your protocol, your service needs to look unobjectionable there as well. Some combination of ScrambleSuit and FTE are great starts here, but it sure is a good thing that the research world has been working on so many prototype designs lately.
So where does that leave us? It would be neat to think about a broad deployment and operations plan here. I would want to do it in conjunction with some other groups, like Team Cymru on the technical platform side and some on-the-ground partner groups for distributing bridge addresses more effectively among social networks. We've made some good progress on the underlying technologies that would increase the success chances of such a deployment — though we've mostly been doing it using volunteers in our spare time on the side, so it's slower going than it could be. And several other groups (e.g. torservers.net) have recently gotten funding for deploying Tor bridges, so maybe we could combine well with them.
In any case it won't be a quick and simple job, since all these pieces have to come together. It's increasingly clear that just getting addresses should be the easy part of this. It's how you give them out, and what you run on the server side to defeat China's scanning, that still look like the two tough challenges for somebody trying to scale up their circumvention tool design.