Learning more about the GFW's active probing system
Roya, David, Nick, nweaver, Vern, and I just finished a research project in which we revisited the Great Firewall of China's (GFW) active probing system. This system was brought to life several years ago to reactively probe and block circumvention proxies, including Tor. You might remember an earlier blog post that gave us some first insight into how the active probing system works. Several questions, however, remained. For example, we were left wondering what the system's physical infrastructure looked like. Is the GFW using dedicated machines behind their thousands of probing IP addresses? Does the GFW even "own" all these IP addresses? Rumour had it that the GFW was hijacking IP addresses for a short period of time, but there was no conclusive proof. As a result, we teamed up and set out to answer these, and other questions.
Because this was a network measurement project, we started by compiling datasets. We created three datasets, comprising hours (a Sybil-like experiment to attract many probes), months (an experiment to measure reachability for clients in China), and even years (log files of a long-established server) worth of active probing data. Together, these datasets allow us to look at the GFW's active probing system from different angles, illuminating aspects we wouldn't be able to observe with just a single dataset. We are able to share two of our datasets, so you are very welcome to reproduce our work, or do your own analysis.
We now want to give you an overview of our most interesting findings.
- Generally, once a bridge is detected and blocked by the GFW, it remains blocked. But does this mean that the bridge is entirely unreachable? We measured the blocking effectiveness by continuously making a set of virtual private systems in China connect to a set of bridges under our control. We found that every 25 hours, for a short period of time, our Tor clients in China were able to connect to our bridges. This is illustrated in the diagram shown below. Every point represents one connection attempt, meaning that our client in China was trying to connect to our bridge outside of China. Note the curious periodic availability pattern for both Unicom and CERNET (the two ISPs in China we measured from). Sometimes, network security equipment goes into "fail open" mode while it updates its rule set, but it is not clear if this is happening here.
- We were able to find patterns in the TCP headers of active probes that suggest that all these thousands of IP addresses are, in fact, controlled by a single source. Check out the initial sequence number (ISN) pattern in the diagram below. It shows the value of ISNs (y-axis) over time (x-axis). Every point in the graph represents the SYN segment of one active probing connection. If all probing connections would have come from independent computers, we would have expected a random distribution of points. That's because ISNs are typically chosen randomly to protect against off-path attackers. Instead, we see a clear linear pattern across IP addresses. We believe that active probes derive their ISN from the current time.
- We discovered that Tor is not the only victim of active probing attacks; the GFW is targeting other circumvention systems, namely SoftEther and GoAgent. This highlights the modular nature of the active probing system. It appears to be easy for GFW engineers to add new probing modules to react to emerging, proxy-based circumvention tools.
- Back in 2012, the system worked in 15-minute-queues. These days, it seems to be able to scan bridges in real-time. On average, it takes only half a second after a bridge connection for an active probe to show up.
- Using a number of traceroute experiments, we could show that the GFW's sensor is stateful and seems unable to reassemble TCP streams.
Luckily, we now have several pluggable transports that can defend against active probing. ScrambleSuit and its successor, obfs4, defend against probing attacks by relying on a shared secret that is distributed out of band. Meek tunnels traffic over cloud infrastructure, which does not prevent active probing, but greatly increases collateral damage when blocked. While we keep developing and maintaining circumvention tools, we need to focus more on usability. A powerful and carefully-engineered circumvention tool is of little use if folks find it too hard to use. That's why projects like the UX sprint are so important.
So wait, you're wanting to run it as root? Don't do that. If the browser gets comprised, which is really not that difficult, it will be able to fuck with your system badly. A non exhaustive list of how root can take over the kernel:
- write to MSRs that allow arbitrary memory access
- load kernel modules
- exploit privileged subsystems like perf
- boot into a malicious kernel with kexec
- replace the kernel image in /boot
- issue iopems and iopls that allow arbitrary write access
- send code to the GPU, which has direct access to memory
- disable many access controls, and bypass permissions
- and much more
Judging by your understanding of command line though, I should think you would already know that running things like web browsers as root is a bad idea...