Announcing the Tor Farsi blog

We are happy to announce the launch of the Tor Farsi blog. The site is created in response to the great reception of Tor and circumvention tools amongst Iranian users. The goal of this site is to be a one-stop place to find Tor related material in Farsi.

The Farsi team will translate white papers, summaries of select posts, and important updates relevant to Tor. We want to create a community of Farsi-speaking Tor users and empower them with information about anonymity and privacy on the Internet. We hope this community will spread this information to others to help them with their Internet anonymity and privacy needs.

New Tor Browser Bundles

The Tor Browser Bundles have been updated to Tor which has a fix for a security critical bug. Please see the release announcement for further details. All users should update immediately.

This Tor Browser Bundle release also contains new Firefox patches which improve privacy and unlinkability.

Tor Browser Bundle (2.2.35-1)

  • Update Tor to
  • Update NoScript to 2.2.3
  • Update Torbutton to 1.4.5
  • New Firefox patches
    • Disable SSL Session ID tracking
    • Provide an observer event to close persistent connections

Tor is released (security patches)

Tor fixes a critical heap-overflow security issue in Tor's
buffers code. Absolutely everybody should upgrade.

The bug relied on an incorrect calculation when making data continuous
in one of our IO buffers, if the first chunk of the buffer was
misaligned by just the wrong amount. The miscalculation would allow an
attacker to overflow a piece of heap-allocated memory. To mount this
attack, the attacker would need to either open a SOCKS connection to
Tor's SocksPort (usually restricted to localhost), or target a Tor
instance configured to make its connections through a SOCKS proxy
(which Tor does not do by default).

Good security practice requires that all heap-overflow bugs should be
presumed to be exploitable until proven otherwise, so we are treating
this as a potential code execution attack. Please upgrade immediately!
This bug does not affect bufferevents-based builds of Tor. Special
thanks to "Vektor" for reporting this issue to us!

Tor also fixes several bugs in previous versions, including
crash bugs for unusual configurations, and a long-term bug that
would prevent Tor from starting on Windows machines with draconian
AV software.

With this release, we remind everyone that 0.2.0.x has reached its
formal end-of-life. Those Tor versions have many known flaws, and
nobody should be using them. You should upgrade -- ideally to the
0.2.2.x series. If you're using a Linux or BSD and its packages are
obsolete, stop using those packages and upgrade anyway.

The Tor 0.2.1.x series is also approaching its end-of-life: it will no
longer receive support after some time in early 2012.

Changes in version - 2011-12-16

Major bugfixes:

  • Fix a heap overflow bug that could occur when trying to pull
    data into the first chunk of a buffer, when that chunk had
    already had some data drained from it. Fixes CVE-2011-2778;
    bugfix on Reported by "Vektor".
  • Initialize Libevent with the EVENT_BASE_FLAG_NOLOCK flag enabled, so
    that it doesn't attempt to allocate a socketpair. This could cause
    some problems on Windows systems with overzealous firewalls. Fix for
    bug 4457; workaround for Libevent versions 2.0.1-alpha through
  • If we mark an OR connection for close based on a cell we process,
    don't process any further cells on it. We already avoid further
    reads on marked-for-close connections, but now we also discard the
    cells we'd already read. Fixes bug 4299; bugfix on,
    which was the first version where we might mark a connection for
    close based on processing a cell on it.
  • Correctly sanity-check that we don't underflow on a memory
    allocation (and then assert) for hidden service introduction
    point decryption. Bug discovered by Dan Rosenberg. Fixes bug 4410;
    bugfix on
  • Fix a memory leak when we check whether a hidden service
    descriptor has any usable introduction points left. Fixes bug
    4424. Bugfix on
  • Don't crash when we're running as a relay and don't have a GeoIP
    file. Bugfix on; fixes bug 4340. This backports a fix
    we've had in the 0.2.3.x branch already.
  • When running as a client, do not print a misleading (and plain
    wrong) log message that we're collecting "directory request"
    statistics: clients don't collect statistics. Also don't create a
    useless (because empty) stats file in the stats/ directory. Fixes
    bug 4353; bugfix on

Minor bugfixes:

  • Detect failure to initialize Libevent. This fix provides better
    detection for future instances of bug 4457.
  • Avoid frequent calls to the fairly expensive cull_wedged_cpuworkers
    function. This was eating up hideously large amounts of time on some
    busy servers. Fixes bug 4518; bugfix on
  • Resolve an integer overflow bug in smartlist_ensure_capacity().
    Fixes bug 4230; bugfix on Tor Based on a patch by
    Mansour Moufid.
  • Don't warn about unused log_mutex in log.c when building with
    --disable-threads using a recent GCC. Fixes bug 4437; bugfix on which introduced --disable-threads.
  • When configuring, starting, or stopping an NT service, stop
    immediately after the service configuration attempt has succeeded
    or failed. Fixes bug 3963; bugfix on
  • When sending a NETINFO cell, include the original address
    received for the other side, not its canonical address. Found
    by "troll_un"; fixes bug 4349; bugfix on
  • Fix a typo in a hibernation-related log message. Fixes bug 4331;
    bugfix on; found by "tmpname0901".
  • Fix a memory leak in launch_direct_bridge_descriptor_fetch() that
    occurred when a client tried to fetch a descriptor for a bridge
    in ExcludeNodes. Fixes bug 4383; bugfix on
  • Backport fixes for a pair of compilation warnings on Windows.
    Fixes bug 4521; bugfix on and on
  • If we had ever tried to call tor_addr_to_str on an address of
    unknown type, we would have done a strdup on an uninitialized
    buffer. Now we won't. Fixes bug 4529; bugfix on
    Reported by "troll_un".
  • Correctly detect and handle transient lookup failures from
    tor_addr_lookup. Fixes bug 4530; bugfix on
    Reported by "troll_un".
  • Fix null-pointer access that could occur if TLS allocation failed.
    Fixes bug 4531; bugfix on Found by "troll_un".
  • Use tor_socket_t type for listener argument to accept(). Fixes bug
    4535; bugfix on Found by "troll_un".

Minor features:

  • Add two new config options for directory authorities:
    AuthDirFastGuarantee sets a bandwidth threshold for guaranteeing the
    Fast flag, and AuthDirGuardBWGuarantee sets a bandwidth threshold
    that is always sufficient to satisfy the bandwidth requirement for
    the Guard flag. Now it will be easier for researchers to simulate
    Tor networks with different values. Resolves ticket 4484.
  • When Tor ignores a hidden service specified in its configuration,
    include the hidden service's directory in the warning message.
    Previously, we would only tell the user that some hidden service
    was ignored. Bugfix on 0.0.6; fixes bug 4426.
  • Update to the December 6 2011 Maxmind GeoLite Country database.

Packaging changes:

  • Make it easier to automate expert package builds on Windows,
    by removing an absolute path from makensis.exe command.

Tor is out (security fix)

Tor fixes a critical heap-overflow security issue in
Tor's buffers code. Absolutely everybody should upgrade.

The bug relied on an incorrect calculation when making data continuous
in one of our IO buffers, if the first chunk of the buffer was
misaligned by just the wrong amount. The miscalculation would allow an
attacker to overflow a piece of heap-allocated memory. To mount this
attack, the attacker would need to either open a SOCKS connection to
Tor's SocksPort (usually restricted to localhost), or target a Tor
instance configured to make its connections through a SOCKS proxy
(which Tor does not do by default).

Good security practice requires that all heap-overflow bugs should be
presumed to be exploitable until proven otherwise, so we are treating
this as a potential code execution attack. Please upgrade immediately!
This bug does not affect bufferevents-based builds of Tor. Special
thanks to "Vektor" for reporting this issue to us!

This release also contains a few minor bugfixes for issues discovered

Changes in version - 2011-12-16

Major bugfixes

  • Fix a heap overflow bug that could occur when trying to pull
    data into the first chunk of a buffer, when that chunk had
    already had some data drained from it. Fixes CVE-2011-2778;
    bugfix on Reported by "Vektor".

Minor bugfixes

  • If we can't attach streams to a rendezvous circuit when we
    finish connecting to a hidden service, clear the rendezvous
    circuit's stream-isolation state and try to attach streams
    again. Previously, we cleared rendezvous circuits' isolation
    state either too early (if they were freshly built) or not at all
    (if they had been built earlier and were cannibalized). Bugfix on; fixes bug 4655.
  • Fix compilation of the libnatpmp helper on non-Windows. Bugfix on; fixes bug 4691. Reported by Anthony G. Basile.
  • Fix an assertion failure when a relay with accounting enabled
    starts up while dormant. Fixes bug 4702; bugfix on

Minor features

  • Update to the December 6 2011 Maxmind GeoLite Country database.

November 2011 Progress Report

The progress report for November 2011 is released as pdf and plaintext documents. Highlights include progress on the new Tor Check, Tor Bulk Exitlist, global media hits, Tor Cloud launch, and three new proposals to improve Tor Bridge Relay functionality in difficult environments.

Tor is out

Tor introduces initial IPv6 support for bridges, adds
a "DisableNetwork" security feature that bundles can use to avoid
touching the network until bridges are configured, moves forward on
the pluggable transport design, fixes a flaw in the hidden service
design that unnecessarily prevented clients with wrong clocks from
reaching hidden services, and fixes a wide variety of other issues.

Changes in version - 2011-12-08
Major features:

  • Clients can now connect to private bridges over IPv6. Bridges
    still need at least one IPv4 address in order to connect to
    other relays. Note that we don't yet handle the case where the
    user has two bridge lines for the same bridge (one IPv4, one
    IPv6). Implements parts of proposal 186.
  • New "DisableNetwork" config option to prevent Tor from launching any
    connections or accepting any connections except on a control port.
    Bundles and controllers can set this option before letting Tor talk
    to the rest of the network, for example to prevent any connections
    to a non-bridge address. Packages like Orbot can also use this
    option to instruct Tor to save power when the network is off.
  • Clients and bridges can now be configured to use a separate
    "transport" proxy. This approach makes the censorship arms race
    easier by allowing bridges to use protocol obfuscation plugins. It
    implements the "managed proxy" part of proposal 180 (ticket 3472).
  • When using OpenSSL 1.0.0 or later, use OpenSSL's counter mode
    implementation. It makes AES_CTR about 7% faster than our old one
    (which was about 10% faster than the one OpenSSL used to provide).
    Resolves ticket 4526.
  • Add a "tor2web mode" for clients that want to connect to hidden
    services non-anonymously (and possibly more quickly). As a safety
    measure to try to keep users from turning this on without knowing
    what they are doing, tor2web mode must be explicitly enabled at
    compile time, and a copy of Tor compiled to run in tor2web mode
    cannot be used as a normal Tor client. Implements feature 2553.
  • Add experimental support for running on Windows with IOCP and no
    kernel-space socket buffers. This feature is controlled by a new
    "UserspaceIOCPBuffers" config option (off by default), which has
    no effect unless Tor has been built with support for bufferevents,
    is running on Windows, and has enabled IOCP. This may, in the long
    run, help solve or mitigate bug 98.
  • Use a more secure consensus parameter voting algorithm. Now at
    least three directory authorities or a majority of them must
    vote on a given parameter before it will be included in the
    consensus. Implements proposal 178.

Major bugfixes:

  • Hidden services now ignore the timestamps on INTRODUCE2 cells.
    They used to check that the timestamp was within 30 minutes
    of their system clock, so they could cap the size of their
    replay-detection cache, but that approach unnecessarily refused
    service to clients with wrong clocks. Bugfix on, when
    the v3 intro-point protocol (the first one which sent a timestamp
    field in the INTRODUCE2 cell) was introduced; fixes bug 3460.
  • Only use the EVP interface when AES acceleration is enabled,
    to avoid a 5-7% performance regression. Resolves issue 4525;
    bugfix on

Privacy/anonymity features (bridge detection):

  • Make bridge SSL certificates a bit more stealthy by using random
    serial numbers, in the same fashion as OpenSSL when generating
    self-signed certificates. Implements ticket 4584.
  • Introduce a new config option "DynamicDHGroups", enabled by
    default, which provides each bridge with a unique prime DH modulus
    to be used during SSL handshakes. This option attempts to help
    against censors who might use the Apache DH modulus as a static
    identifier for bridges. Addresses ticket 4548.

Minor features (new/different config options):

  • New configuration option "DisableDebuggerAttachment" (on by default)
    to prevent basic debugging attachment attempts by other processes.
    Supports Mac OS X and Gnu/Linux. Resolves ticket 3313.
  • Allow MapAddress directives to specify matches against super-domains,
    as in "MapAddress * *".
    Implements issue 933.
  • Slightly change behavior of "list" options (that is, config
    options that can appear more than once) when they appear both in
    torrc and on the command line. Previously, the command-line options
    would be appended to the ones from torrc. Now, the command-line
    options override the torrc options entirely. This new behavior
    allows the user to override list options (like exit policies and
    ports to listen on) from the command line, rather than simply
    appending to the list.
  • You can get the old (appending) command-line behavior for "list"
    options by prefixing the option name with a "+".
  • You can remove all the values for a "list" option from the command
    line without adding any new ones by prefixing the option name
    with a "/".
  • Add experimental support for a "defaults" torrc file to be parsed
    before the regular torrc. Torrc options override the defaults file's
    options in the same way that the command line overrides the torrc.
    The SAVECONF controller command saves only those options which
    differ between the current configuration and the defaults file. HUP
    reloads both files. (Note: This is an experimental feature; its
    behavior will probably be refined in future 0.2.3.x-alpha versions
    to better meet packagers' needs.)

Minor features:

  • Try to make the introductory warning message that Tor prints on
    startup more useful for actually finding help and information.
    Resolves ticket 2474.
  • Running "make version" now displays the version of Tor that
    we're about to build. Idea from katmagic; resolves issue 4400.
  • Expire old or over-used hidden service introduction points.
    Required by fix for bug 3460.
  • Move the replay-detection cache for the RSA-encrypted parts of
    INTRODUCE2 cells to the introduction point data structures.
    Previously, we would use one replay-detection cache per hidden
    service. Required by fix for bug 3460.
  • Reduce the lifetime of elements of hidden services' Diffie-Hellman
    public key replay-detection cache from 60 minutes to 5 minutes. This
    replay-detection cache is now used only to detect multiple
    INTRODUCE2 cells specifying the same rendezvous point, so we can
    avoid launching multiple simultaneous attempts to connect to it.

Minor bugfixes (on Tor 0.2.2.x and earlier):

  • Resolve an integer overflow bug in smartlist_ensure_capacity().
    Fixes bug 4230; bugfix on Tor Based on a patch by
    Mansour Moufid.
  • Fix a minor formatting issue in one of tor-gencert's error messages.
    Fixes bug 4574.
  • Prevent a false positive from the check-spaces script, by disabling
    the "whitespace between function name and (" check for functions
    named 'op()'.
  • Fix a log message suggesting that people contact a non-existent
    email address. Fixes bug 3448.
  • Fix null-pointer access that could occur if TLS allocation failed.
    Fixes bug 4531; bugfix on Found by "troll_un".
  • Report a real bootstrap problem to the controller on router
    identity mismatch. Previously we just said "foo", which probably
    made a lot of sense at the time. Fixes bug 4169; bugfix on
  • If we had ever tried to call tor_addr_to_str() on an address of
    unknown type, we would have done a strdup() on an uninitialized
    buffer. Now we won't. Fixes bug 4529; bugfix on
    Reported by "troll_un".
  • Correctly detect and handle transient lookup failures from
    tor_addr_lookup(). Fixes bug 4530; bugfix on
    Reported by "troll_un".
  • Use tor_socket_t type for listener argument to accept(). Fixes bug
    4535; bugfix on Found by "troll_un".
  • Initialize conn->addr to a valid state in spawn_cpuworker(). Fixes
    bug 4532; found by "troll_un".

Minor bugfixes (on Tor 0.2.3.x):

  • Fix a compile warning in tor_inet_pton(). Bugfix on;
    fixes bug 4554.
  • Don't send two ESTABLISH_RENDEZVOUS cells when opening a new
    circuit for use as a hidden service client's rendezvous point.
    Fixes bugs 4641 and 4171; bugfix on Diagnosed
    with help from wanoskarnet.
  • Restore behavior of overriding SocksPort, ORPort, and similar
    options from the command line. Bugfix on

Build fixes:

  • Properly handle the case where the build-tree is not the same
    as the source tree when generating src/common/common_sha1.i,
    src/or/micro-revision.i, and src/or/or_sha1.i. Fixes bug 3953;
    bugfix on

Code simplifications, cleanups, and refactorings:

  • Remove the pure attribute from all functions that used it
    previously. In many cases we assigned it incorrectly, because the
    functions might assert or call impure functions, and we don't have
    evidence that keeping the pure attribute is worthwhile. Implements
    changes suggested in ticket 4421.
  • Remove some dead code spotted by coverity. Fixes cid 432.
    Bugfix on, closes bug 4637.

Research problem: Five ways to test bridge reachability

Once we get more (and more diverse) bridge addresses, the next research step is that we'll need to get better at telling which bridges are blocked in which jurisdictions. For example, most of the bridges we give out via https and gmail are blocked in China. But which ones exactly? How quickly do they get blocked? Do some last longer than others? Do they ever get unblocked? Is there some pattern to the blocking, either by time, by IP address or network, by user load on the bridge, or by distribution strategy? We can't evaluate new bridge distribution strategies if we can't track whether the bridges in each strategy are being blocked.

Generally speaking, bridge reachability tests break down into two approaches: passive and active. Passive tests don't involve any new connections on the part of Tor clients or bridges, whereas active tests follow the more traditional "scanning" idea. None of the reachability tests we've thought of are perfect. Instead, here we discuss how to combine imperfect tests and use feedback from the tests to balance their strengths and weaknesses.

Passive approaches

We should explore two types of passive testing approaches: reporting from bridges and reporting from clients.

Passive approach 1: reporting from bridges. Right now Tor relays and bridges publish aggregate user counts — rough number of users per country per day. In theory we can look at the user counts over time to detect statistical drops in usage for a given country. That approach has produced useful results in practice for overall connections to the public Tor relays from each country: see George Danezis's initial work on a Tor censorship detector.

But there are two stumbling blocks when trying to apply the censorship detector model to individual bridges. First, we don't have ground truth about which bridges were actually blocked or not at a given time, so we have no way to validate our models. Second, while overall usage of bridges in a given country might be high, the load on a given bridge tends to be quite low, which in turns makes it difficult to achieve statistical significance when looking at usage drops.

Ground truth needs to be learned through active tests: we train the models with usage patterns for bridges that get blocked and bridges that don't get blocked, and the model predictions should improve. The question of statistical significance can be overcome by treating the prediction as a hint: even if our models don't give us enough confidence to answer "blocked for sure" or "not blocked for sure" about a given bridge, they should be able to give us a number reflecting likelihood that the bridge is now blocked. That number should feed back into the active tests, for example so we pay more attention to bridges that are more likely to be newly blocked.

Passive approach 2: reporting from clients. In addition to the usage reporting by bridges, we should also consider reachability reporting by clients. Imagine a Tor client that has ten bridges configured. It tries to connect to each of them, and finds that two work and eight don't. This client is doing our scanning for us, if only we could safely learn about its results. The first issue that comes up is that it could mistakenly report that a bridge is blocked if that bridge is instead simply down. So we would want to compare the reports to concurrent active scans from a "more free" jurisdiction, to pick out the bridges that are up in one place yet down in another.

From there, the questions get trickier: 1) does the set of bridges that a given user reports about create a fingerprint that lets us recognize that user later? Even if the user reports about each bridge through a separate Tor circuit, we'd like to know the time of the scan, and nearby times can be used to build a statistical profile. 2) What if users submit intentionally misleading reports? It seems there's a tension between wanting to build a profile for the user (to increase our confidence in the validity of her reports) versus wanting to make sure the user doesn't develop any recognizable profile. Perhaps the Nymble design family can contribute an "unlinkable reputation" trick to resolve the conflict, but as we find ourselves saying so often at Tor, more research remains.

Active approaches

The goal of active scanning is to get ground truth on whether each bridge is really blocked. There's a tradeoff here: frequent scans give us better resolution and increased confidence, but too many scan attempts draw attention to the scanners and thus to the addresses being scanned.

We should use the results of the passive and indirect scans to give hints about what addresses to do active scans on. In the steady-state, we should aim to limit our active scans to bridges that we think just went from unblocked to blocked or vice versa, and to a sample of others for spot checks to keep our models trained.

There are three pieces to active scanning: direct scans, reverse scans, and indirect scans.

Active approach 1: direct scans. Direct scans are what we traditionally think of when we think of scanning: get access to a computer in the target country, give it a list of bridges, and have it connect directly to each bridge on the list.

Before I continue though, I should take an aside to discuss types of blocking. In September 2009 when China first blocked some bridges, I spent a while probing the blocked bridges from a computer in Beijing. From what I could tell, China blocked the bridges in two ways. If the bridge had no other interesting services running (like a webserver), they just blackholed the IP address, meaning no packets to or from the IP address made it through the firewall. But if there was an interesting service, they blocked the bridge by IP and port. (I could imagine this more fine-grained blocking was done by dropping SYN packets, or by sending TCP RST packets; but I didn't get that far in my investigation.)

So there are two lessons to be learned here. First, the degree to which our active scans match real Tor client behavior could influence the accuracy of the scans. Second, some real-world adversaries are putting considerable effort — probably manual effort — into examining the bridges they find and choosing how best to filter them. After all, if they just blindly filtered IP addresses we list as bridges, we could add Baidu's address as a bridge and make them look foolish. (We tried that; it didn't work.)

These lessons leave us with two design choices to consider.

First, how much of the Tor protocol should we use when doing the scans? The spectrum ranges from a simple TCP scan (or even just a SYN scan), to a vanilla SSL handshake, to driving a real Tor client that does a genuine Tor handshake. The less realistic the handshake, the more risk that we conclude the bridge is reachable when in fact it isn't; but the more realistic the handshake, the more we stand out to an adversary watching for Tor-like traffic.

The mechanism by which the adversary is discovering bridges also impacts which reachability tests are a smart idea. For example, it appears that China may have recently started doing deep packet inspection (DPI) for Tor-like connections over the Great Firewall and then doing active-followup SSL handshakes to confirm which addresses are Tor bridges. If that report turns out to be true, testing bridge reachability via active scans that include handshaking would be counterproductive: the act of doing the test would influence the answer. We can solve their attack by changing our handshake so they don't recognize it anymore, or by introducing scanning-resistance measures like bridge passwords. In the shorter-term, we should confirm that simple connection scanning (without a handshake) doesn't trigger any blocks, and then restrict ourselves to that type of scanning in China until we've deployed a better answer.

Second, should we scan "decoy" addresses as well, to fool an observer into thinking that we're not scanning for Tor bridges in particular, and/or to drive up the work the observer needs to do to distinguish the "real" bridges? Whether this trick is useful depends on the level of sophistication and dedication of the adversary. For example, China has already demonstrated that they check IP addresses before blocking them, and in general I worry that the more connections you make to anything, the more likely you are to attract attention for further scrutiny. How would we generate the list of decoy addresses? If we choose it randomly from the space of IP addresses, a) most of them will not respond, and b) we'll invite abuse complaints from security people looking for worms. Driving up the work factor sounds like a great feature, but it could have the side effect that it encourages the adversary to invest in an automated "is this a Tor bridge" checker, which would be an unfortunate step for them to take if they otherwise wouldn't.

Active direct scans come with a fundamental dilemma: the more we think a bridge has been blocked, the more we want to scan it; but the more likely it is to be blocked, the more the adversary might already be watching for connections to it, for example to do a "zig-zag" bridge enumeration attack. So we need to avoid scanning bridges that we think are not blocked. But we also need to explore more subtle scanning techniques such as the ones below.

Active approach 2: reverse scans. A bridge that gets duplex blackholed by a government firewall can learn that it has been filtered by trying to make a connection into the filtered country.

For example, each bridge might automatically connect to periodically, and publish the results of its reachability test in its extrainfo descriptor. We could either feed this information into the models and follow up with other active probes if the bridge thinks it's been blocked; or we could use it the other way by instructing bridges that we think have been blocked to launch a reverse scan.

We can actually take advantage of the flexibility of the Tor protocol to do scanning from each bridge to Baidu without changing the bridge code at all: we simply try to extend an ordinary circuit from the bridge to the target destination, and learn at what stage the 'extend' request failed. (We should extend the Tor control protocol to expose the type of failure to the Tor controller, but that's a simple matter of programming.)

Note that these reverse scans can tell us that a bridge has been blocked, but it can't tell us that a bridge hasn't been blocked, since it could just be blocked in a more fine-grained way.

Finally, how noticeable would these reverse scans be? That is, could the government firewall enumerate bridges by keeping an eye out for Tor-like connection attempts to Baidu? While teaching the bridges to do the scanning themselves would require more work, it would give us more control over how much the scans stick out.

Active approach 3: indirect scans. Indirect scans use other services as reflectors. For example, you can connect to an FTP server inside the target country, tell it that your address is the address of the bridge you want to scan, and then try to fetch a file. How it fails should tell you whether that FTP server could reach the bridge or not.

There are many other potential reflector protocols out there, each with their own tradeoffs. For example, can we instruct a DNS request in-country to recurse to the target bridge address, and distinguish between "I couldn't reach that DNS server" and "that wasn't a DNS server"? (DNS is probably not the right protocol to use inside China, given the amount of DNS mucking they are already known to do.)

Another avenue is a variant on idle scanning, which takes advantage of predictable TCP IPID patterns: send a packet directly to the bridge to learn its current IPID, then instruct some computer in-country to send a packet, and then send another packet directly to find the new IPID and learn whether or not the in-country packet arrived.

What other services can be bent to our will? Can we advertise the bridge address on a bittorrent tracker that's popular in China and see whether anybody connects? Much creative research remains here.

Putting it all together

One of the puzzle pieces holding us back from rolling out the "tens of thousands of bridge addresses offered by volunteers with spare net blocks" plan is that we need better ways to get feedback on when addresses get blocked. The ideas in this blog post hopefully provide a good framework for thinking about the problem.

For the short term, we should deploy a basic TCP connection scanner from inside several censoring countries (China, Iran, and Syria come to mind). Since the "clients report" passive strategy still has some open research questions, we should get all our hints from the "bridges report" passive strategy. As we're ramping up, and especially since our current bridges are either not blocked at all (outside China), or mostly blocked (inside China), we should feel free to do more thorough active scans to get a better intuition about what scanning can teach us.

In the long term, I want to use these various building blocks in a feedback loop to identify and reward successful bridge distribution strategies, as outlined in Levchenko and McCoy's FC 2011 paper.

Specifically, we need these four building blocks:

1) A way to discover how much use a bridge is seeing from a given country. Done: see the WECSR10 paper and usage graphs.

2) A way to get fresh bridge addresses over time. The more addresses we can churn through, the more aggressive we can be in experimenting with novel distribution approaches. See the "more bridge addresses" blog post for directions here.

3) A way to discover when a bridge is blocked in a given country. That's what this blog post is about.

4) Distribution strategies that rely on different mechanisms to make enumeration difficult. Beyond our "https" and "gmail" distribution strategies, we know a variety of people in censored countries, and we can think of each of these people as a distribution channel.

We can define the efficiency of a bridge address in terms of how many people use it and how long before it gets blocked. So a bridge that gets blocked very quickly scores a low efficiency, a bridge that doesn't get blocked but doesn't see much use scores a medium efficiency, and a popular bridge that doesn't get blocked scores high. We can characterize the efficiency of a distribution channel as a function of the efficiency of the bridges it distributes. The key insight is that we then adapt how many new bridges we give to each distribution channel based on its efficiency. So channels that are working well automatically get more addresses to give out, and channels that aren't working well automatically end up with fewer addresses.

Of course, there's more to it than that. For example, we need to consider how to handle the presence of bridge enumeration attacks that work independently of which distribution channel a given bridge address was given to. We also need to consider attacks that artificially inflate the efficiency of bridges (and thus make us overreward distribution channels), or that learn about a bridge but choose not to block it. But that, as they say, is a story for another time.

Different Ways to Use a Bridge

Different Ways to Use a Bridge

When some adversary prevents users from reaching the Tor network, our most popular answer is using bridge relays (or bridges for short). Those are hidden relays, not listed along with all the other relays in the networkstatus documents. Currently, we have about 600 of them, and censors are having different luck learning and blocking them — see the 10 ways to discover Tor bridges blog post for more on how discovery approaches may work. China appears to be the only place able to successfully block most bridges consistently, whereas other places occasionally manage to block Tor's handshake and as a byproduct block all bridges too.

Bridge users can be broadly grouped in three camps:

  • Tor is blocked, and some way — any way — to reach the network has to be found. The adversary is not very dangerous, but very annoying.
  • Tor may or may not be blocked, but the user is trying to hide the fact they're hiding Tor. The adversary may be extremely dangerous.
  • Other bridge users: Testing whether the bridge works (automated or manual), probing, people using bridges without their knowledge because they came pre-configured in their bundle.

Here we examine the first two use cases more closely. Specifically, we want to look at properties of a bridge that must exist for it to be useful to a user.

Bridges — building blocks

First off, it is helpful to understand some basics about bridges and how they are used by normal users.

Bridges are very similar to ordinary relays, in that they are operated by volunteers who made the decision to help people reach the Tor network. The difference to a normal relay is where the information about the bridge is published to — bridges can choose to either publish to the Bridge Authority (a special relay collecting all bridge addresses that it receives), or to not publish their information anywhere. The former are called public bridges, the latter private bridges.

We don't have any information about the number of private bridges, but since the Bridge Authority collects data about the public bridges, we do know that bridges are used in the real world. See the bridge users and networksize graphs for some examples. Not having data about private bridges or their users means some of the analysis below is based on discussions with users of private bridges and our best estimates, and it can't be backed up by statistical data.

The reason we're using a Bridge Authority and collecting information about bridges is that we want to give out bridges to people who aren't in a position to learn about a private bridge themselves.

"Learning about a bridge" generally means learning about the bridge's IP address and port, so that a connection can be made. Optionally, the bridge identity fingerprint is included, too — this helps the client to verify that it is actually talking to the bridge, and not someone that is intercepting the network communication. For a private bridge, the operator has to pass on that information; public bridges wrap up some information about themselves in what is called their bridge descriptor and send that to the bridge authority. The bridge descriptor includes some statistical information, like aggregated user counts and countries of origin of traffic. Our analysis here focuses solely on the data provided by public bridges.

Once a user has learned about some bridges, she configures her Tor client to use them, typically by entering them into the appropriate field in Vidalia. Alternatively, she might use a different controller or put the data into tor's configuration file directly.

Learning from bridge descriptor fetches

We've been collecting bridge descriptor fetch statistics on the bridge authority, and are using this data to pose some questions and propose some changes. The statistics collected are how many bridge descriptors were served in total, and how many unique descriptors were served, as well as the 0, 25, 50, 75 and 100 percentiles of fetches per descriptor. Every 24 hours, the current statistics are written to disk and the counters reset. The current statistics are attached to this post, for closer inspection. We've also prepared two graphs to easily see the data at a glance:

Total bridge downloads

Bridge downloads per descriptor

The first thing to note is that there aren't very many bridge descriptor fetches at all, which isn't a big surprise: The current Tor bundles don't fetch them when they're used in the typical way, that is by adding some bridges via Vidalia's interface after bridges were discovered via one of our bridge distribution channels. Over the past month, there have been between 3900 and 6600 fetches per day, with a median of 8 fetches per bridge. The most fetched descriptor is fetched up to 350 times per day, indicating that it does indeed belong to a bridge that was given out with a fingerprint and being used by Tor clients. We have gotten some reports that a bundle circulated with pre-configured bridges, and this could account for the many fetches.

Secondly, most bridge descriptors are not even fetched from the authority. This is a clear indication that we can improve our odds of updating bridge clients with current bridge info if we can get them to request the information better.

Improving Tor's behaviour for the two user groups

The first group ("Tor is blocked, and some way to reach the network has to be found") is mostly concerned about circumvention, without necessarily hiding that they're using Tor from someone. Typically, access to the Internet is filtered, but circumventing a filter isn't too risky and people are more concerned with access than hiding their tracks from a data-collecting adversary. Speed, bootstrapping performance, and little intervention/maintenance of a setup are the biggest goals.

Adding auto-discovery mechanisms for bridges that changed their IP address will help this group gain a lot more robustness when it comes to maintaining connectivity against an adversary that blocks public relays, but isn't very quick in blocking all bridges. As far as we know, this is currently true for the majority of our bridge userbase.

For the second group ("Tor may or may not be blocked, but the user is trying to hide the fact they're hiding Tor"), precise control over Tor's actions is much more important than constant connectivity, and private bridges might be utilized to that end as well. A user in this group wants to keep the bridges he's using secret, and puts up with frequent updates to the configuration for the added safety of only connecting to a pre-specified IP address:port combination. We can't do very much for a user belonging to this group with regard to bridges, but he will very much benefit from improvements made to our general fingerprintability resistance. Also options like the DisableNetwork option (prevent touching the network in any kind of way until this option is changed) that was recently introduced to Tor help him.

Another interesting point here is that we can indirectly improve the behaviour for the first group by not making it too easy to learn about bridges, because censors can use the same data to more effectively block them. This means that we shouldn't, for example, start giving out significantly more bridges to a single user.

We've written a proposal to implement some changes in Tor, to better facilitate the needs of the first group of bridge users.

Syndicate content Syndicate content