research

Call for papers: FOCI'14 workshop

The 4th USENIX Workshop on Free and Open Communications on the Internet is calling for papers. The workshop will be held on August 18, 2014 and seeks to bring together researchers and practitioners working on means to study, detect, or circumvent Internet censorship.

Past iterations of the workshop featured a number of great Tor research papers including a proposal for cloud-based onion routing, our OONI design paper, an analysis of how the GFW blocks Tor, a censorship analyzer for Tor, and a proposal for latency reduction in Tor circuits.

Given the past success of the FOCI series, we are looking forward to another round of great Tor papers.

Paper submissions are due on Tuesday, May 13, 2014, 11:59 p.m. PDT.

How to read our China usage graphs

Recently somebody asked me why our usage numbers in China are so low. More precisely, his question was "How do I read this graph in any way other than 'Tor is effectively blocked in China'?" After writing up an answer for him, I realized I should share it with the rest of the Tor community too.

The correct interpretation of the graph is "obfs3 bridges have not been deployed enough to keep up with the demand in China". So it isn't that Tor is blocked — it's that we haven't done much of a deployment for obfs3 bridges or ScrambleSuit bridges, which are the latest steps in the arms race.

The short explanation is that the old vanilla SSL Tor transport doesn't work in China anymore due to their active probing infrastructure. The obfs2 transport doesn't work anymore either, for the same reason. The obfs3 transport works great for now, and thousands of people are happily using it — and some of those people aren't reflected in the graphs you see (I'll explain that more below).

The medium-length explanation is that we've been leading and coordinating the international research effort at understanding how to design and analyze transports that resist both DPI and active probing, and approximately none of these approaches have been deployed at a large scale yet. So it doesn't make sense to say that Tor is blocked in China, because it mischaracterizes Tor as a static protocol. "Tor" when it comes to censorship circumvention is a toolbox of options — some of them work in China, some don't. The ones that work (i.e. that should resist both DPI and active probing) haven't been rolled out very widely, in large part because we have funders who care about the research side but we have nobody who funds the operations, deployment, or scale-up side.

The long explanation is that it comes down to three issues:

First, there are some technical steps we haven't finished deploying in terms of collecting statistics about users of bridges + pluggable transports. The reason is that the server side of the pluggable transport needs to inform the Tor bridge what country the user was from, so the Tor bridge can include that in its (aggregated, anonymized) stats that it publishes to the metrics portal. We've now built most of the pieces, but most of the deployed bridges aren't running the new code yet. So the older bridges that are reporting their user statistics aren't seeing very many users from China, while the bridges that *aren't* reporting their user statistics, which are the ones that offer the newer pluggable transports, aren't well-represented in the graph. We have some nice volunteers looking into what fraction of deployed obfs3 bridges don't have this new 'extended ORPort' feature. But (and you might notice the trend here) we don't have any funders currently who care about counting bridge users in China.

Second, we need to get more addresses. One approach is to get them from volunteers who sign up their computer as a bridge. That provides great sustainability in terms of community involvement (we did a similar push for obfs2 bridges back when Iran was messing with SSL, and got enough to make a real difference at the time), but one address per volunteer doesn't scale very well. The intuition is that the primary resource that relays volunteer is bandwidth, whereas the primary resource that bridges volunteer is their address — and while bandwidth is an ongoing contribution, once your IP address gets blocked then your contribution has ended, at least for the country that blocked it, or until you get another address via DHCP, etc. The more scalable approaches to getting bridge addresses involve coordinating with ISPs and network operators, and/or designs like Flashproxy to make it really easy for users to sign up their address. I describe these ideas more in "approach four" and "approach five" of the Strategies for getting more bridge addresses blog post. But broad deployment of those approaches is again an operational thing, and we don't have any funded projects currently for doing it.

Third, we need ways of letting people learn about bridges and use them without getting them noticed. We used to think the arms race here was "how do you give out addresses such that the good guys can learn a few while the bad guys can't learn all of them", a la the bridges.torproject.org question. But it's increasingly clear that scanning resistance will be the name of the game in China: your transport has to not only blend in with many other flows (to drive up the number of scans they have to launch), but also when they connect to that endpoint and speak your protocol, your service needs to look unobjectionable there as well. Some combination of ScrambleSuit and FTE are great starts here, but it sure is a good thing that the research world has been working on so many prototype designs lately.

So where does that leave us? It would be neat to think about a broad deployment and operations plan here. I would want to do it in conjunction with some other groups, like Team Cymru on the technical platform side and some on-the-ground partner groups for distributing bridge addresses more effectively among social networks. We've made some good progress on the underlying technologies that would increase the success chances of such a deployment — though we've mostly been doing it using volunteers in our spare time on the side, so it's slower going than it could be. And several other groups (e.g. torservers.net) have recently gotten funding for deploying Tor bridges, so maybe we could combine well with them.

In any case it won't be a quick and simple job, since all these pieces have to come together. It's increasingly clear that just getting addresses should be the easy part of this. It's how you give them out, and what you run on the server side to defeat China's scanning, that still look like the two tough challenges for somebody trying to scale up their circumvention tool design.

New Tor Denial of Service Attacks and Defenses

New work on denial of service in Tor will be presented at NDSS '14 on Tuesday, February 25th, 2014:


The Sniper Attack: Anonymously Deanonymizing and Disabling the Tor Network
by Rob Jansen, Florian Tschorsch, Aaron Johnson, and Björn Scheuermann
To appear at the 21st Symposium on Network and Distributed System Security

A copy of the published paper is now available, as are my slides explaining the attacks in both PPTX and PDF formats.

We found a new vulnerability in the design of Tor's flow control algorithm that can be exploited to remotely crash Tor relays. The attack is an extremely low resource attack in which an adversary's bandwidth may be traded for a target relay's memory (RAM) at an amplification rate of one to two orders of magnitude. Ironically, the adversary can use Tor to protect it's identity while attacking Tor without significantly reducing the effectiveness of the attack.

We studied relay availability under the attack using Shadow, a discrete-event network simulator that runs the real Tor software in a safe, private testing environment, and found that we could disable each of the fastest guard and the fastest exit relay in a range of 1-18 minutes (depending on relay RAM capacity). We also found that the entire group of the top 20 exit relays, representing roughly 35% of Tor bandwidth capacity at the time of the analysis, could be disabled in a range of 29 minutes to 3 hours and 50 minutes. We also analyzed how the attack could potentially be used to deanonymize hidden services, and found that it would take between 4 and 278 hours before the attack would succeed (again depending on relay RAM capacity, as well as the bandwidth resources used to launch the attack).

Due to our devastating findings, we also designed three defenses that mitigate our attacks, one of which provably renders the attack ineffective. Defenses have been implemented and deployed into the Tor software to ensure that the Tor network is no longer vulnerable as of Tor version 0.2.4.18-rc and later. Some of that work can be found in Trac tickets #9063, #9072, #9093, and #10169.

In the remainder of this post I will detail the attacks and defenses we analyzed, noting again that this information is presented more completely (and more elegantly) in our paper.


The Tor Network Infrastructure

The Tor network is a distributed system made up of thousands of computers running the Tor software that contribute their bandwidth, memory, and computational resources for the greater good. These machines are called Tor relays, because their main task is to forward or relay network traffic to another entity after performing some cryptographic operations. When a Tor user wants to download some data using Tor, the user's Tor client software will choose three relays from those available (an entry, middle, and exit), form a path or circuit between these relays, and then instruct the third relay (the exit) to fetch the data and send it back through the circuit. The data will get transferred from its source to the exit, from the exit to the middle, and from the middle to the entry before finally making its way to the client.

Flow Control

The client may request the exit to fetch large amounts of data, and so Tor uses a window-based flow control scheme in order to limit the amount of data each relay needs to buffer in memory at once. When a circuit is created, the exit will initialize its circuit package counter to 1000 cells, indicating that it is willing to send 1000 cells into the circuit. The exit decrements the package counter by one for every data cell it sends into the circuit (to the middle relay), and stops sending data when the package counter reaches 0. The client at the other end of the circuit keeps a delivery counter, and initializes it to 0 upon circuit creation. The client increments the delivery counter by 1 for every data cell it receives on that circuit. When the client's delivery counter reaches 100, it sends a special Tor control cell, called a SENDME cell, to the exit to signal that it received 100 cells. Upon receiving the SENDME, the exit adds 100 to its package counter and continues sending data into the circuit.

This flow control scheme limits the amount of outstanding data that may be in flight at any time (between the exit and the client) to 1000 cells, or about 500 KiB, per circuit. The same mechanism is used when data is flowing in the opposite direction (up from the client, through the entry and middle, and to the exit).

The Sniper Attack

The new Denial of Service (DoS) attack, which we call "The Sniper Attack", exploits the flow control algorithm to remotely crash a victim Tor relay by depleting its memory resources. The paper presents three attacks that rely on the following two techniques:

  1. the attacker stops reading from the TCP connection containing the attack circuit, which causes the TCP window on the victim's outgoing connection to close and the victim to buffer up to 1000 cells; and
  2. the attacker causes cells to be continuously sent to the victim (exceeding the 1000 cell limit and consuming the victim's memory resources) either by ignoring the package window at packaging end of the circuit, or by continuously sending SENDMEs from the delivery end to the packaging end even though no cells have been read by the delivery end.



I'll outline the attacks here, but remember that the paper and slides provide more details and useful illustrations of each of the three versions of the attack.

Basic Version 1 (attacking an entry relay)

In basic version 1, the adversary controls the client and the exit relay, and chooses a victim for the entry relay position. The adversary builds a circuit through the victim to her own exit, and then the exit continuously generates and sends arbitrary data through the circuit toward the client while ignoring the package window limit. The client stops reading from the TCP connection to the entry relay, and the entry relay buffers all data being sent by the exit relay until it is killed by its OS out-of-memory killer.

Basic Version 2 (attacking an exit relay)

In basic version 2, the adversary controls the client and an Internet destination server (e.g. website), and chooses a victim for the exit relay position. The adversary builds a circuit through the victim exit relay, and then the client continuously generates and sends arbitrary data through the circuit toward the exit relay while ignoring the package window limit. The destination server stops reading from the TCP connection to the exit relay, and the exit relay buffers all data being sent by the client until it is killed by its OS out-of-memory killer.

Efficient Version

Both of the basic versions of the attack above require the adversary to generate and send data, consuming roughly the same amount of upstream bandwidth as the victim's available memory. The efficient version reduces this cost by one to two orders of magnitude.

In the efficient version, the adversary controls only a client. She creates a circuit, choosing the victim for the entry position, and then instructs the exit relay to download a large file from some external Internet server. The client stops reading on the TCP connection to the entry relay, causing it to buffer 1000 cells.

At this point, the adversary may "trick" the exit relay into sending more cells by sending it a SENDME cell, even though the client has not actually received any cells from the entry. As long as this SENDME does not increase the exit relay's package counter to greater than 1000 cells, the exit relay will continue to package data from the server and send it into the circuit toward the victim. If the SENDME does cause the exit relay's package window to exceed the 1000 cell limit, it will stop responding on that circuit. However, the entry and middle node will hold the circuit open until the client issues another command, meaning its resources will not be freed.

The bandwidth cost of the attack after circuit creation is simply the bandwidth cost of occasionally sending a SENDME to the exit. The memory consumption speed depends on the bandwidth and congestion of non-victim circuit relays. We describe how to parallelize the attack using multiple circuits and multiple paths with diverse relays in order to draw upon Tor's inherent resources. We found that with roughly 50 KiB/s of upstream bandwidth, an attacker could consume the victim's memory at roughly 1 MiB/s. This is highly dependent on the victim's bandwidth capabilities: relays that use token buckets to restrict bandwidth usage will of course bound the attack's consumption rate.

Rather than connecting directly to the victim, the adversary may instead launch the attack through a separate Tor circuit using a second client instance and the "Socks4Proxy" or "Socks5Proxy" option. In this case, she may benefit from the anonymity that Tor itself provides in order to evade detection. We found that there is not a significant increase in bandwidth usage when anonymizing the attack in this way.

Defenses

A simple but naive defense against the Sniper Attack is to have the guard node watch its queue length, and if it ever fills to over 1000 cells, kill the circuit. This defense does not prevent the adversary from parallelizing the attack by using multiple circuits (and then consuming 1000 cells on each), which we have shown to be extremely effective.

Another defense, called "authenticated SENDMEs", tries to protect against receiving a SENDME from a node that didn't actually receive 100 cells. In this approach, a 1 byte nonce is placed in every 100th cell by the packaging end, and that nonce must be included by the delivery end in the SENDME (otherwise the packaging end rejects the SENDME as inauthentic). As above, this does not protect against the parallel attack. It also doesn't defend against either of the basic attacks where the adversary controls the packaging end and ignores the SENDMEs anyway.

The best defense, as we suggested to the Tor developers, is to implement a custom, adaptive out-of-memory circuit killer in application space (i.e. inside Tor). The circuit killer is only activated when memory becomes scarce, and then it chooses the circuit with the oldest front-most cell in its circuit queue. This will prevent the Sniper Attack by killing off all of the attack circuits.

With this new defense in place, the next game is for the adversary to try to cause Tor to kill an honest circuit. In order for an adversary to cause an honest circuit to get killed, it must ensure that the front-most cell on its malicious circuit queue is at least slightly "younger" than the oldest cell on any honest queue. We show that the Sniper Attack is impractical with this defense: due to fairness mechanisms in Tor, the adversary must spend an extraordinary amount of bandwidth keeping its cells young — bandwidth that would likely be better served in a more traditional brute-force DoS attack.

Tor has implemented a version of the out-of-memory killer for circuits, and is currently working on expanding this to channel and connection buffers as well.

Hidden Service Attack and Countermeasures

The paper also shows how the Sniper Attack can be used to deanonymize hidden services:

  1. run a malicious entry guard relay;
  2. run the attack from Oakland 2013 to learn the current guard relay of the target hidden service;
  3. run the Sniper Attack on the guard from step 2, knocking it offline and causing the hidden service to choose a new guard;
  4. repeat, until the hidden service chooses the relay from step 1 as its new entry guard.



The technique to verify that the hidden service is using a malicious guard in step 4 is the same technique used in step 2.

In the paper, we compute the expected time to succeed in this attack while running malicious relays of various capacities. It takes longer to succeed against relays that have more RAM, since it relies on the Sniper Attack to consume enough RAM to kill the relay (which itself depends on the bandwidth capacity of the victim relay). For the malicious relay bandwidth capacities and honest relay RAM amounts used in their estimate, we found that deanonymization would involve between 18 and 132 Sniper Attacks and take between ~4 and ~278 hours.

This attack becomes much more difficult if the relay is rebooted soon after it crashes, and the attack is ineffective when Tor relays are properly defending against the Sniper Attack (see the "Defenses" section above).

Strategies to defend hidden services in particular go beyond those suggested here to include entry guard rate-limiting, where you stop building circuits if you notice that your new guards keep going down (failing closed), and middle guards, guard nodes for your guard nodes. Both of these strategies attempt to make it harder to coerce the hidden service into building new circuits or exposing itself to new relays, since that is precisely what is needed for deanonymization.

Open Problems

The main defense implemented in Tor will start killing circuits when memory gets low. Currently, Tor uses a configuration option (MaxMemInCellQueues) that allows a relay operator to configure when the circuit-killer should be activated. There is likely not one single value that makes sense here: if it is too high, then relays with lower memory will not be protected; if it is too low, then there may be more false positives resulting in honest circuits being killed. Can Tor determine this setting in an OS-independent way that allows relays to automatically find the right value for MaxMemInCellQueues?

The defenses against the Sniper Attack prevent the adversary from crashing the victim relay, but the adversary may still consume a relay's bandwidth (and memory resources, to a critical level) at relatively low cost. This means that even though the Sniper Attack can no longer kill a relay, it can still consume a large amount of its bandwidth at a relatively low cost (similar to more traditional bandwidth amplification attacks). More analysis of general bandwidth consumption attacks and defenses remains a useful research problem.

Finally, hidden services also need some love. More work is needed to redesign them in a way that does not allow a client to cause the hidden service to choose new relays on demand.

What the "Spoiled Onions" paper means for Tor users

Together with Stefan, I recently published the paper "Spoiled Onions: Exposing Malicious Tor Exit Relays". The paper only discusses our results and how we obtained them and we don't talk a lot about the implications for Tor users. This blog post should fill that gap.

First, it's important to understand that 25 relays in four months isn't a lot. It is ultimately a very small fraction of the Tor network. Also, it doesn't mean that 25 out of 1,000 relays are malicious or misconfigured (we weren't very clear on that in the paper). We have yet to calculate the churn rate of exit relays which is the rate at which relays join and leave the network. 1,000 is really just the approximate number of exit relays at any given point in time. So the actual number of exit relays we ended up testing in four months is certainly higher than that. As a user, that means that you will not see many malicious relays "in the wild".

Second, Tor clients select relays in their circuits based on the bandwidth they are contributing to the network. Faster relays see more traffic than slower relays which balances the load in the Tor network. Many of the malicious exit relays contributed relatively little bandwidth to the Tor network which makes them quite unlikely to be chosen as relay in a circuit.

Third, even if your traffic is going through a malicious exit relay, it doesn't mean that everything is lost. Many of the attacks we discovered still caused Firefox' infamous "about:certerror" warning page. As a vigilant user, you would notice that something isn't quite right and hopefully leave the site. In addition, TorBrowser ships with HTTPS-Everywhere which by default attempts to connect to some sites over HTTPS even though you just typed "http://". After all, as we said in the past, "Plaintext over Tor is still plaintext".

Finally, we want to point out that all of these attacks are of course not limited to the Tor network. You face the very same risks when you are connecting to any public WiFi network. One of the fundamental problems is the broken CA system. Do you actually know all the ~50 organisation who you implicitly trust when you start your Firefox, Chrome, or TorBrowser? Making the CA system more secure is a very challenging task for the entire Internet and not just the Tor network.

A Critique of Website Traffic Fingerprinting Attacks

Website traffic fingerprinting is an attack where the adversary attempts to recognize the encrypted traffic patterns of specific web pages without using any other information. In the case of Tor, this attack would take place between the user and the Guard node, or at the Guard node itself.

There are two models under which these attacks are typically studied: The "closed world" scenario, and the "open world" scenario. In the "closed world" scenario, the only traffic patterns the classifier ever sees are for web pages that it has already been trained on, and it typically must successfully label all of them. This is meant to simulate situations where users only use Tor for viewing a small set of censored web pages and nothing else. The "open world" scenario is slightly more realistic in that it attempts to examine the ability of the adversary to recognize a few censored pages out of a much larger set of uncensored pages, some of which it has never seen before.

It is important to note that in both models, these papers are reporting results on their ability to classify individual pages and not overall websites, despite often using the terms website and webpage interchangeably.

The most comprehensive study of the statistical properties of this attack against Tor was done by Panchenko et al. In their "closed world" study, they evaluated their success rates of classifying 775 web pages. In their "open world" study, they classified a handful of censored pages in 5000 page subsets of 1,000,000 total web pages.

Since then, a series of smaller scale follow-on attack papers have claimed improved success rates since Panchenko's study, and in at least two instances these papers even claimed to completely invalidate any attempt at defense based only on partial results.

While there may have been some improvements in classifier accuracy in these papers over Panchenko, defenses were "broken" and dismissed primarily by taking a number of shortcuts that ignore basic properties of machine learning theory and statistics in order to enable their claims of success.

Despite these subsequent improvements in these attacks, we are still skeptical of the efficacy of this attack in a real world scenario, and believe that only minimal defenses will be needed to ensure the attack continues to be unusable in practice.

This post will be divided into the following areas: First, we will attempt to distill the adversary model used by this body of work, and describe the adversary's goals. Then, we will review the basic properties of machine learning theory and statistics that govern classifier accuracy and real-world performance. Next, we will make these properties concrete by briefly discussing additional real-world sources of hidden complexity in the website traffic fingerprinting problem domain. We will then specifically enumerate both the useful contributions and the shortcuts taken by various work to date. Finally, we will conclude with suggestions and areas for improvement in future work.

Pinning Down the Adversary Model

Website traffic fingerprinting attack papers assume that an adversary is for some reason either unmotivated or unable to block Tor outright, but is instead very interested in detecting patterns of specific activity inside Tor-like traffic flows.

The exact motivation for this effort on behalf of the adversary is typically not specified, but there seem to be three possibilities, in order of increasing difficulty for the adversary:

  1. The adversary is interested in blocking specific censored webpage traffic patterns, while still leaving the rest of the Tor-like traffic unmolested (perhaps because Tor's packet obfuscation layer looks like something legitimate that the adversary wants to avoid blocking).
  2. The adversary is interested in identifying all of the users that visit a small, specific set of targeted pages.
  3. The adversary is interested in recognizing every single web page a user visits.


Unfortunately, it seems that machine learning complexity theory and basic statistics are heavily stacked against all three of these goals.

Theoretical Issues: Factors Affecting Classification Accuracy

Machine Learning theory tells us that the accuracy of a classifier is affected by four main factors: the size of the hypothesis space (ie the information-theoretic complexity of the classification categories), the accuracy of feature extraction and representation (the bias in the hypothesis space), the size of the instance space (the world size), and the number of training examples provided to the classifier.

Bounds have actually been established on the effect of each of these areas to the likelihood of achieving a given accuracy rate, and any good undergraduate course in Machine Learning will cover these topics in detail. For a concise review of these accuracy bounds, we recommend Chapters 5 and 7 of Machine Learning by Tom Mitchell.

The brief summary is this: as the number and/or complexity of classification categories increases while reliable feature information does not, the classifier eventually runs out of descriptive feature information, and either true positive accuracy goes down or the false positive rate goes up. This error is called the bias in the hypothesis space. In fact, even for unbiased hypothesis spaces, the number of training examples required to achieve a reasonable error bound is a function of the number and complexity of the classification categories.

It turns out that the effects of all of these factors are actually observable in the papers that managed to study a sufficient world size.

First, for the same world size and classifier technique, every work that examined both the open and closed worlds reports much higher accuracy rates for the open world than for the closed world. This is due to the higher hypothesis space complexity involved in labeling every page in the closed world, as opposed to labeling only a very small subset of censored targets in the open world. This confirms the above machine learning theory, and tells us that that an adversary that is attempting to recognize every single web page in existence is going to have a very difficult time getting any accuracy, as compared to one who is classifying only a select interesting subset.

It is also clearly visible in Panchenko's large-scale open world study that increasing the world size contributes to a slower, but still steady decline in open world accuracy in Figure 4 below. The effect of the hypothesis space complexity (increased number of pages to classify) on the open world can also be seen in the rising false positive rates of Figure 5 below (this rise may seem small, but in the next section we will show how even tiny increases in false positives are in fact devastating to the attack).

Every other attack paper since then has neglected to use world sizes large enough to observe these effects in sufficient detail, but again, machine learning theory tells us they are still there, just beyond the published data points.

Practical Issues: False Positives Matter. A Lot.

Beyond classifier accuracy and the factors that affect it, the practical applicability of the website traffic fingerprinting attack is also affected by some basic statistical results. These statistics (which have been examined in other computer security-related application domains of machine learning) show that false positives end up destroying the effectiveness of classification unless they are vanishingly small (much less than 10^-6).

It actually turns out that false positives are especially damaging to this attack, perhaps even more so than most other application domains. In the website traffic fingerprinting attack, false positive results are a built-in property of the world: if one or more pages' traffic patterns are similar enough to a target page's pattern to trigger a false positive, these sets of pages will always be misclassified as the target page every time that traffic pattern is present. Unlike end-to-end timing correlation, the adversary does not get to benefit from information derived from repeated visits (except in narrow, contrived scenarios that we will address in our literature review below).

What's more is that this also means that small world sizes directly impact the false positive rate. As you increase the world size, the likelihood of including a page that matches a traffic stream of your target page also increases.

To demonstrate the damaging effects of false positives on the attack, we will now consider the effects of false positives on the adversary's two remaining feasible goals: flagging users who visit a specific controversial web page over Tor, and censoring only a subset of Tor-like traffic.

In the event where the adversary is attempting to recognize visits to highly sensitive material (such as a specific web page about a particular protest action) and is interested in gathering a list of suspects who visit the web page, it is easy to see that even with very high accuracy rates, the suspect list quickly grows without bound.

To see this, note that the probability of at least one false positive over N independent page loads is given by 1 - (1-fp)^N, where fp is the probability of a false positive.

Even with a false positive rate as low as 0.2% (which is typical for this literature, and again is also lower than reality due to small world sizes), after performing just N=100 different page loads, 18% of the userbase will be falsely accused of visiting targeted material at least once. After each user has performed N=1000 different page loads, 86.5% of that user base will have been falsely accused of visiting the target page at least once. In fact, many of these users will be falsely accused much more frequently than that (as per the Binomial Distribution).

If instead of trying to enumerate specific visitors, the adversary is trying to interfere with the traffic patterns of certain web pages, the accuracy value that matters is the Bayesian Detection Rate, which is the probability that a traffic pattern was actually due to a censored/target page visit given that the detector said it was recognized. This is written as P(Censored|Classified).

Using Bayes Theorem, it is possible to convert from the true and false positive rates of P(Classified|Censored) and P(Classified|~Censored) to the Bayesian Detection Rate of P(Censored|Classified) like so:

P(Censored|Classified) =
 P(Classified|Censored)*P(Censored) /
  (P(Censored)*P(Classified|Censored) + 
   P(~Censored)*P(Classified|~Censored))

Under conditions of low censorship (0.1% -- such as when the Tor traffic successfully blends in with a large volume of innocuous Internet traffic, or when Tor is used for both censorship circumvention and general privacy), with a true positive rate of 0.6 and a false positive rate of 0.005, we have:

P(Censored|Classified) = 0.6*.001/(.001*0.6+.999*.005)
P(Censored|Classified) = 0.10
P(~Censored|Classified) = 1 - P(Censored|Classified)
P(~Censored|Classified) = 0.90

This means that when a traffic pattern is classified as a censored/target page, there is a 90% chance that the classifier is actually telling the adversary to interfere with an unrelated traffic stream, and only a 10% chance that the classifier was actually correct.

This phenomenon was explored in detail in the Intrusion Detection System literature, and is the reason why anomaly and classification-based antivirus and IDS systems have failed to materialize in the marketplace (despite early success in academic literature).

Practical Issues: Multipliers of World Size are Common

Beyond the above issues, there are a number of additional sources of world size and complexity in the website traffic fingerprinting problem domain that are completely unaddressed by the literature.

Again, it is important to remember that despite the use of the word "website" in their titles, these attacks operate on classifying instances of traffic patterns created by single pages, and not entire sites. For some sites, there may be little difference between one page and another. For many sites, the difference between component pages is significant.

It is also important to note that the total number of pages on the web is actually quite larger than the number of items indexed by Google (which is quite a bit larger than even the 1,000,000 page crawl used by Panchenko). In particular, the effects of dynamically generated pages, unindexed pages, rapidly updated pages, interactive pages, and authenticated webapps have been neglected in these studies, and their various possible traffic patterns contribute a very large number of additional web traffic patterns to the world, especially when the different manners in which users interact with them are taken into account.

Beyond this, each page actually also has at least 8 different common traffic patterns consisting of the combination of the following common browser configurations: Cached vs non-cached; Javascript enabled vs disabled; adblocked vs non-adblocked. In each of these combinations, different component resources are loaded for a given page. In fact, the cached vs non-cached property is not just binary: arbitrary combinations of content elements on any given page may be cached by the browser from a previous visit to a related resource. What's more, in Tor Browser, either restarting the browser or using the "New Identity" button causes all of this caching state to be reset.

In addition, similarities between non-web and web traffic also complicate the problem domain. For just one example, it is likely that an open tab with a Twitter query in it generates similar traffic patterns to a Tor-enabled XMPP or IRC client.

All of these factors increase the complexity of the hypothesis space and the instance space, which as we demonstrated above, will necessarily reduce the accuracy of the attack in both theory and practice.

Literature Review: Andriy Panchenko et al

Title: Website Fingerprinting in Onion Routing Based Anonymization Networks

As mentioned previously, Panchenko's work (actually the first work to successfully apply website traffic fingerprinting to Tor) is still the most comprehensive study to date.

Here is a brief list of its strengths:

  1. The world sizes are huge.

    5000 page subsets of 1,000,000 pages may still have representational issues compared to the real world, but this is still the largest study to date.

  2. Careful feature extraction (hypothesis space construction).

    The reason why this work succeeded where earlier works failed is because rather than simply toss raw data into a machine learning algorithm, they were extremely careful with the representation of data for their classifiers. This likely enabled both the efficiency that allowed such a large world size, and reduced the bias in the hypothesis space that would otherwise lead to low accuracy and high false positives at such large world sizes.

  3. In the "open world", they varied the type of censored target sites to evaluate accuracy.

    Instead of merely picking the target pages that were easiest to classify (such as video sites), they varied the type of target pages and explored the effects on both true and false positive accuracy. None of the other literature to date has examined this effect in detail.

  4. They are careful to tune their classifiers to minimize false positives.

    Modern classifiers typically allow you to trade off between false positives and false negatives. Given how deeply false positives impact the adversary's goals, this is very important.

  5. They demonstrate knowledge of all of the factors involved in PAC theory and practical classifier accuracy.

    All of the other papers to date simply omit one or more of the following: data on feature extraction, the contribution of individual features to accuracy, the number of training examples, the effect of target size on both the open and closed world, the effect of target page types on accuracy, and the effects of world size on accuracy.

While impressive, it does miss a few details:

  1. It fails to acknowledge that false positives are a property of specific sites.

    In reality, nearly every false positive they experienced in a given subset of 5000 pages is a false positive that will likely appear in a classifier that is run against the entire 1,000,000 page dataset. They probably should have tallied the total false positives over multiple runs for this reason, and analyzed their properties.

  2. It still fails to provide a realistic adversary model in order to justify claims that the attack actually accomplishes what the adversary wants.

Literature Review: Kevin P. Dyer et al

Title: Peek-a-Boo, I Still See You: Why Efficient Traffic Analysis Countermeasures Fail

This work attempts to evaluate several defenses against the website traffic fingerprinting attack, and also attempts to evaluate an "idealized best case" defense called BuFLO.

On the positive side, the work makes the following contributions:

  1. It creates an ideal high-overhead tunable defense (BuFLO) that can be compared against other practical defenses.
  2. It is quite clear about its definitions of accuracy and experimental setup.
  3. Source code is provided!


However, the work had the following rather serious shortcomings:

  1. Close world only, and a very small closed world at that.

    This is the single largest issue with this attack paper. You cannot claim that all defenses are broken forever if a classifier is only able to somewhat correctly classify 500 pages or less (and only 128 pages in their defense studies!), even in a closed world.

  2. Exaggerated claims give no credit to (or detailed analysis of) relative strengths of defenses.

    Despite the fact that their exaggerated claims are made possible due to their small world size of only 128 pages, the authors do not acknowledge that some light weight defenses did in fact do far better than others in their experiments. In light of the fact that their world size was woefully small, the authors work would have been substantially more humble in terms of acknowledging the effectiveness of some defenses, and should have spent more time focusing on why some defenses did better against some classifiers, what the overhead costs were, and what classifier features were most impacted by each defense.

  3. It does not give statistics on which sites suffer high rates of misclassification.

    Even in their small world size, is it likely that they stumbled upon a few pages that ended up consistently misclassified. Which were these? How often does that happen out of arbitrary 128 page subsets?

  4. Features are evaluated, but not in terms of their contribution to accuracy.

    There are several standard ways of evaluating the contribution of features to accuracy under various conditions. The authors appeared to employ none of them.

Literature Review: Xiang Cai et al

Title: Touching from a Distance: Website Fingerprinting Attacks and Defenses

This work also attempts to evaluate several additional defenses against the website traffic fingerprinting attack, proposes a Hidden Markov Model extension of the attack to classify web sites in addition to pages, and also attempts to evaluate the BuFLO defense from the "Peek-a-boo" paper, as well as a lower-overhead variant.

On the positive side, the following useful contributions were made:

  1. A low-resource "congestion sensitive" version of Dyer et al's BuFLO defense is presented.

    This defense is also tunable, and deserves a more fair evaluation than was given to it by its own creators. Hopefully they will follow up on it.

  2. An automatic feature extraction mechanism is built in to the classifier.

    The work uses an edit distance algorithm to automatically extract features (pairwise transposition, insertion, deletions and substitutions) rather than manual specification and extraction.

However, this paper also shares many of the Peek-a-boo paper's shortcomings above, and has some of its own.

  1. Minimal edit distance classifiers are ideally suited for a closed world.

    The DLSVM classifier finds the shortest edit distance between a single testing instance and the model instances from training. This is a useful heuristic for a closed world, where you know that your training instance will match *something* from your model, but how well does it fare when you have to tune a cutoff to decide when a given traffic stream's edit distance is too large to be among your censored target pages? It seems likely that it is still better than Panchenko's classifier in the open world, but this has not been proven conclusively, especially since the effect of site uniqueness on the classifier was not examined.

  2. Glosses over effects of defenses on edit distance components.

    Even though edit distance classifiers do not have explicit features, they still have implicit ones. It would be useful to study the effect of low-resource defenses on the most information-heavy components of the edit distances. Or better: to analyze how to tailor their BuFLO variant to target these components in an optimal way.

  3. Defenses were not given a fair analysis.

    In the specific case of both Tor Browser's pipeline defense and HTTPOS, an analysis of the actual prevalence of pipelining and server side support for the HTTPOS feature set was not performed. Nor were statistics on actual request combination and reordering given, nor were the effects on the feature components analyzed.

  4. Navigation-based Hidden Markov Models are user-specific, and will also suffer from Garbage-In Garbage-Out false positive properties.

    While a Hidden Markov Model would appear to provide increased accuracy by being able to utilize multiple observations, the reality of web usage is that users' navigation patterns are not uniform and will vary by individual and social group, especially for news/portal sites and for social networks. For example, people often navigate off of Twitter to view outside links, and some people share more links than others. It is quite likely that the HMM will consider any link portal (or any repetitive background network activity) that has an occasional false positive match for Twitter to be mistaken for such intermittent Twitter usage.

Literature Review: Tao Wang and Ian Goldberg

Title: Improved Website Fingerprinting on Tor

This paper improves upon the work by Cai et al by statistically removing Tor flow control cells, improving the training performance of the classifier, and by performing a small-scale open world study of edit distance based classifiers. It also provides some limited analysis of difficult to classify sites.

The paper has the following issues:

  1. A broken implementation of Tor Browser's defense was studied, despite warnings to the contrary.

    Last year, we discovered serious issues with the HTTP Pipeline randomization defense that were introduced during the transition from Firefox 4 to Firefox 10, and that other issues may have been present in the Firefox 4 version as well. These issues were corrected during the transition to Firefox 17, and the pipeline randomization defense was vastly improved, but the authors still chose to evaluate the broken version, despite our offers for assistance. Moreover, like previous work, an analysis of the actual prevalence of server-side pipelining support and request combination and reordering was not performed.

  2. Small world size.

    This paper studies a smaller closed world size than the previous edit distance based work (100 pages instead of 1000), and uses only 1000 pages for its open world study.

  3. Target site types were not varied in the open world.

    Unlike Panchenko's work, the types of sites chosen as censored targets were fixed, not varied. This makes it hard to evaluate the effects of types of sites with either very distinct or more typical traffic patterns on the accuracy of their classifier.

Source code is provided, however, so it may be possible to revisit and correct some of these issues, at least.

Concluding Remarks: Suggestions for Future Work

This post is not meant to dismiss the website traffic fingerprinting attack entirely. It is merely meant to point out that defense work has not been as conclusively studied as these papers have claimed, and that defenses are actually easier than is presently assumed by the current body of literature.

We believe that the theoretical and practical issues enumerated in this post demonstrate that defenses do not need to be terribly heavy-weight to be effective. It is likely that in practice, relatively simple defenses will still substantially increase the false positive rate of this attack when it is performed against large numbers of web pages (and non-web background traffic), against large volumes of Tor-like traffic, against large numbers of users, or any combination thereof.

We therefore encourage a re-evaluation of existing defenses such as HTTPOS, SPDY and pipeline randomization, and Guard node adaptive padding, Traffic Morphing, as well as the development of additional defenses.

It is possible that some types of pages (especially those on video sites, file locker sites, and sites with very large content elements) may make natural false posties rare, especially when classified among smaller, more typical pages. For instance, it is unlikely to be possible to generate false positives when the classifier needs only to distinguish between pages from Wikipedia versus Youtube, but it is likely that generic large content and video downloads can be obfuscated such that it is hard to recognize the specific video or download site with minimal relative overhead.

As in the work done by Panchenko, we also suggest evaluating each of your classifier's explicit or implicit features for their relative contribution to overall accuracy (especially among similar classes of sites) as this will help guide the padding and obfuscation behaviors of any defenses you or others might devise. The high-information component features or edit distances can be analyzed and used as input to a statistically adaptive padding defense, for example.

In any event, please do contact us if you're interested in studying these or other defenses in Tor Browser or Tor itself.

If you are not interested in helping improve the defenses of Tor, do not have time to perform such evaluations in a thorough manner, or have already previously published a website traffic fingerprinting attack paper yourself, we request that you publish or at least provide us with the source code and the data sets involved in your attack, so that we or others can reproduce your results and evaluate potential defenses as well.

Concluding Remarks: Dual Purpose Defenses

It turns out that some defenses against the website traffic fingerprinting attack are also useful against the end-to-end correlation attack. In particular, defenses that increase the rate of false positives of website traffic fingerprinting using padding and other one-ended schemes is very likely to increase the rate of false positives for end-to-end correlation, especially under situations where either limited record-keeping capacity, sample-based analysis, or link-level encryption reduce connection and inter-packet timing information.

If a defense introduces false positive between many different web pages in website traffic fingerprinting by manipulating the traffic patterns at the entrance of the Tor network but not the exit, it will be very likely to introduce false positives during correlation between the entrance and exit traffic of simultaneous downloads of these same pages. As in website traffic fingerprinting, it turns out that even a small amount of false positives will frustrate a dragnet adversary attempting end-to-end correlation.

End-to-end correlation is a much more difficult problem, and we may not ever solve the problem of repeated observations. However, we likely can increase the number and duration of successful observations that correlation attacks will require to build high confidence, especially if the user base grows very large, and if the web moves to higher adoption rates of TLS (so that HTTP identifiers are not available to provide long-term linkability at exit nodes).

Promising defenses include Adaptive Padding, Traffic Morphing, and various transformation and prediction proxies at the exit node (which could also help performance while we're at it).

Improving Tor's anonymity by changing guard parameters

There are tensions in the Tor protocol design between the anonymity provided by entry guards and the performance improvements from better load balancing. This blog post walks through the research questions I raised in 2011, then summarizes answers from three recent papers written by researchers in the Tor community, and finishes by explaining what Tor design changes we need to make to provide better anonymity, and what we'll be trading off.

Part one: The research questions

In Tor, each client selects a few relays at random, and chooses only from those relays when making the first hop of each circuit. This entry guard design helps in three ways:

First, entry guards protect against the "predecessor attack": if Alice (the user) instead chose new relays for each circuit, eventually an attacker who runs a few relays would be her first and last hop. With entry guards, the risk of end-to-end correlation for any given circuit is the same, but the cumulative risk for all her circuits over time is capped.

Second, they help to protect against the "denial of service as denial of anonymity" attack, where an attacker who runs quite a few relays fails any circuit that he's a part of and that he can't win against, forcing Alice to generate more circuits and thus increasing the overall chance that the attacker wins. Entry guards greatly reduce the risk, since Alice will never choose outside of a few nodes for her first hop.

Third, entry guards raise the startup cost to an adversary who runs relays in order to trace users. Without entry guards, the attacker can sign up some relays and immediately start having chances to observe Alice's circuits. With them, new adversarial relays won't have the Guard flag so won't be chosen as the first hop of any circuit; and even once they earn the Guard flag, users who have already chosen guards won't switch away from their current guards for quite a while.

In August 2011, I posted these four open research questions around guard rotation parameters:

  1. Natural churn: For an adversary that controls a given number of relays, if the user only replaces her guards when the current ones become unavailable, how long will it take until she's picked an adversary's guard?
  2. Artificial churn: How much more risk does she introduce by intentionally switching to new guards before she has to, to load balance better?
  3. Number of guards: What are the tradeoffs in performance and anonymity from picking three guards vs two or one? By default Tor picks three guards, since if we picked only one then some clients would pick a slow one and be sad forever. On the other hand, picking only one makes users safer.
  4. Better Guard flag assignment: If we give the Guard flag to more or different relays, how much does it change all these answers?

For reference, Tor 0.2.3's entry guard behavior is "choose three guards, adding another one if two of those three go down but going back to the original ones if they come back up, and also throw out (aka rotate) a guard 4-8 weeks after you chose it." I'll discuss in "Part three" of this post what changes we should make to improve this policy.

Part two: Recent research papers

Tariq Elahi, a grad student in Ian Goldberg's group in Waterloo, began to answer the above research questions in his paper Changing of the Guards: A Framework for Understanding and Improving Entry Guard Selection in Tor (published at WPES 2012). His paper used eight months of real-world historical Tor network data (from April 2011 to December 2011) and simulated various guard rotation policies to see which approaches protect users better.

Tariq's paper considered a quite small adversary: he let all the clients pick honest guards, and then added one new small guard to the 800 or so existing guards. The question is then what fraction of clients use this new guard over time. Here's a graph from the paper, showing (assuming all users pick three guards) the vulnerability due to natural churn ("without guard rotation") vs natural churn plus also intentional guard rotation:

Vulnerability from natural vs intentional guard rotation

In this graph their tiny guard node, in the "without guard rotation" scenario, ends up getting used by about 3% of the clients in the first few months, and gets up to 10% by the eight-month mark. The more risky scenario — which Tor uses today — sees the risk shoot up to 14% in the first few months. (Note that the y-axis in the graph only goes up to 16%, mostly because the attacking guard is so small.)

The second paper to raise the issue is from Alex Biryukov, Ivan Pustogarov, and Ralf-Philipp Weinmann in Luxembourg. Their paper Trawling for Tor Hidden Services: Detection, Measurement, Deanonymization (published at Oakland 2013) mostly focuses on other attacks (like how to censor or track popularity of hidden services), but their Section VI.C. talks about the "run a relay and wait until the client picks you as her guard" attack. In this case they run the numbers for a much larger adversary: if they run 13.8% of the Tor network for eight months there's more than a 90% chance of a given hidden service using their guard sometime during that period. That's a huge fraction of the network, but it's also a huge chance of success. And since hidden services in this case are basically the same as Tor clients (they choose guards and build circuits the same way), it's reasonable to conclude that their attack works against normal clients too so long as the clients use Tor often enough during that time.

I should clarify three points here.

First clarifying point: Tariq's paper makes two simplifying assumptions when calling an attack successful if the adversary's relay *ever* gets into the user's guard set. 1) He assumes that the adversary is also either watching the user's destination (e.g. the website she's going to), or he's running enough exit relays that he'll for sure be able to see the correponding flow out of the Tor network. 2) He assumes that the end-to-end correlation attack (matching up the incoming flow to the outgoing flow) is instantaneous and perfect. Alex's paper argues pretty convincingly that these two assumptions are easier to make in the case of attacking a hidden service (since the adversary can dictate how often the hidden service makes a new circuit, as well as what the traffic pattern looks like), and the paper I describe next addresses the first assumption, but the second one ("how successful is the correlation attack at scale?" or maybe better, "how do the false positives in the correlation attack compare to the false negatives?") remains an open research question.

Researchers generally agree that given a handful of traffic flows, it's easy to match them up. But what about the millions of traffic flows we have now? What levels of false positives (algorithm says "match!" when it's wrong) are acceptable to this attacker? Are there some simple, not too burdensome, tricks we can do to drive up the false positives rates, even if we all agree that those tricks wouldn't work in the "just looking at a handful of flows" case?

More precisely, it's possible that correlation attacks don't scale well because as the number of Tor clients grows, the chance that the exit stream actually came from a different Tor client (not the one you're watching) grows. So the confidence in your match needs to grow along with that or your false positive rate will explode. The people who say that correlation attacks don't scale use phrases like "say your correlation attack is 99.9% accurate" when arguing it. The folks who think it does scale use phrases like "I can easily make my correlation attack arbitrarily accurate." My hope is that the reality is somewhere in between — correlation attacks in the current Tor network can probably be made plenty accurate, but perhaps with some simple design changes we can improve the situation. In any case, I'm not going to try to tackle that research question here, except to point out that 1) it's actually unclear in practice whether you're done with the attack if you get your relay into the user's guard set, or if you are now faced with a challenging flow correlation problem that could produce false positives, and 2) the goal of the entry guard design is to make this issue moot: it sure would be nice to have a design where it's hard for adversaries to get into a position to see both sides, since it would make it irrelevant how good they are at traffic correlation.

Second clarifying point: it's about the probabilities, and that's intentional. Some people might be scared by phrases like "there's an x% chance over y months to be able to get an attacker's relay into the user's guard set." After all, they reason, shouldn't Tor provide absolute anonymity rather than probabilistic anonymity? This point is even trickier in the face of centralized anonymity services that promise "100% guaranteed" anonymity, when what they really mean is "we could watch everything you do, and we might sell or give up your data in some cases, and even if we don't there's still just one point on the network where an eavesdropper can learn everything." Tor's path selection strategy distributes trust over multiple relays to avoid this centralization. The trouble here isn't that there's a chance for the adversary to win — the trouble is that our current parameters make that chance bigger than it needs to be.

To make it even clearer: the entry guard design is doing its job here, just not well enough. Specifically, *without* using the entry guard design, an adversary who runs some relays would very quickly find himself as the first hop of one of the user's circuits.

Third clarifying point: we're considering an attacker who wants to learn if the user *ever* goes to a given destination. There are plenty of reasonable other things an attacker might be trying to learn, like building a profile of many or all of the user's destinations, but in this case Tariq's paper counts a successful attack as one that confirms (subject to the above assumptions) that the user visited a given destination once.

And that brings us to the third paper, by Aaron Johnson et al: Users Get Routed: Traffic Correlation on Tor by Realistic Adversaries (upcoming at CCS 2013). This paper ties together two previous series of research papers: the first is "what if the attacker runs a relay?" which is what the above two papers talked about, and the second is "what if the attacker can watch part of the Internet?"

The first part of the paper should sound pretty familiar by now: they simulated running a few entry guards that together make up 10% of the guard capacity in the Tor network, and they showed that (again using historical Tor network data, but this time from October 2012 to March 2013) the chance that the user has made a circuit using the adversary's relays is more than 80% by the six month mark.

In this case their simulation includes the adversary running a fast exit relay too, and the user performs a set of sessions over time. They observe that the user's traffic passes over pretty much all the exit relays (which makes sense since Tor doesn't use an "exit guard" design). Or summarizing at an even higher level, the conclusion is that so long as the user uses Tor enough, this paper confirms the findings in the earlier two papers.

Where it gets interesting is when they explain that "the adversary could run a relay" is not the only risk to worry about. They build on the series of papers started by "Location Diversity in Anonymity Networks" (WPES 2004), "AS-awareness in Tor path selection" (CCS 2009), and most recently "An Empirical Evaluation of Relay Selection in Tor" (NDSS 2013). These papers look at the chance that traffic from a given Tor circuit will traverse a given set of Internet links.

Their point, which like all good ideas is obvious in retrospect, is that rather than running a guard relay and waiting for the user to switch to it, the attacker should instead monitor as many Internet links as he can, and wait for the user to use a guard such that traffic between the user and the guard passes over one of the links the adversary is watching.

This part of the paper raises as many questions as it answers. In particular, all the users they considered are in or near Germany. There are also quite a few Tor relays in Germany. How much of their results here can be explained by pecularities of Internet connectivity in Germany? Are their results predictive in any way about how users on other continents would fare? Or said another way, how can we learn whether their conclusion shouldn't instead be "German Tor users are screwed, because look how Germany's Internet topology is set up"? Secondly, their scenario has the adversary control the Autonomous System (AS) or Internet Exchange Point (IXP) that maximally deanonymizes the user (they exclude the AS that contains the user and the AS that contains her destinations). This "best possible point to attack" assumption a) doesn't consider how hard it is to compromise that particular part of the Internet, and b) seems like it will often be part of the Internet topology near the user (and thus vary greatly depending on which user you're looking at). And third, like the previous papers, they think of an AS as a single Internet location that the adversary is either monitoring or not monitoring. Some ASes, like large telecoms, are quite big and spread out.

That said, I think it's clear from this paper that there *do* exist realistic scenarios where Tor users are at high risk from an adversary watching the nearby Internet infrastructure and/or parts of the Internet backbone. Changing the guard rotation parameters as I describe in "Part three" below will help in some of these cases but probably won't help in all of them. The canonical example that I've given in talks about "a person in Syria using Tor to visit a website in Syria" remains a very serious worry.

The paper also makes me think about exit traffic patterns, and how to better protect people who use Tor for only a short period of time: many websites pull in resources from all over, especially resources from centralized ad sites. This risk (that it greatly speeds the rate at which an adversary watching a few exit points — or heck, a few ad sites — will be able to observe a given user's exit traffic) provides the most compelling reason I've heard so far to ship Tor Browser Bundle with an ad blocker — or maybe better, with something like Request Policy that doesn't even touch the sites in the first place. On the other hand, Mike Perry still doesn't want to ship an ad blocker in TBB, since he doesn't want to pick a fight with Google and give them even more of a reason to block/drop all Tor traffic. I can see that perspective too.

Part three: How to fix it

Here are five steps we should take, in rough order of how much impact I think each of them would have on the above attacks.

If you like metaphors, think of each time you pick a new guard as a coin flip (heads you get the adversary's guard, tails you're safe this time), and the ideas here aim to reduce both the number and frequency of coin flips.

Fix 1: Tor clients should use fewer guards.

The primary benefit to moving to fewer guards is that there are fewer coin flips every time you pick your guards.

But there's a second benefit as well: right now your choice of guards acts as a kind of fingerprint for you, since very few other users will have picked the same three guards you did. (This fingerprint is only usable by an attacker who can discover your guard list, but in some scenarios that's a realistic attack.) To be more concrete: if the adversary learns that you have a particular three guards, and later sees an anonymous user with exactly the same guards, how likely is it to be you? Moving to two guards helps the math a lot here, since you'll overlap with many more users when everybody is only picking two.

On the other hand, the main downside is increased variation in performance. Here's Figure 10 from Tariq's paper:

Average guard capacity for various numbers of guards

"Farther to the right" is better in this graph. When you pick three guards (the red line), the average speed of your guards is pretty good (and pretty predictable), since most guards are pretty fast and it's unlikely you'll pick slow ones for all three. However, when you only pick only one guard (the purple line), the odds go up a lot that you get unlucky and pick a slow one. In more concrete numbers, half of the Tor users will see up to 60% worse performance.

The fix of course is to raise the bar for becoming a guard, so every possible guard will be acceptably fast. But then we have fewer guards total, increasing the vulnerability from other attacks! Finding the right balance (as many guards as possible, but all of them fast) is going to be an ongoing challenge. See Brainstorm tradeoffs from moving to 2 (or even 1) guards (ticket 9273) for more discussion.

Switching to just one guard will also preclude deploying Conflux, a recent proposal to improve Tor performance by routing traffic over multiple paths in parallel. The Conflux design is appealing because it not only lets us make better use of lower-bandwidth relays (which we'll need to do if we want to greatly grow the size of the Tor network), but it also lets us dynamically adapt to congestion by shifting traffic to less congested routes. Maybe some sort of "guard family" idea can work, where a single coin flip chooses a pair of guards and then we split our traffic over them. But if we want to avoid doubling the exposure to a network-level adversary, we might want to make sure that these two guards are near each other on the network — I think the analysis of the network-level adversary in Aaron's paper is the strongest argument for restricting the variety of Internet paths that traffic takes between the Tor client and the Tor network.

This discussion about reducing the number of guards also relates to bridges: right now if you configure ten bridges, you round-robin over all of them. It seems wise for us to instead use only the first bridge in our bridge list, to cut down on the set of Internet-level adversaries that get to see the traffic flows going into the Tor network.

Fix 2: Tor clients should keep their guards for longer.

In addition to choosing fewer guards, we should also avoid switching guards so often. I originally picked "one or two months" for guard rotation since it seemed like a very long time. In Tor 0.2.4, we've changed it to "two or three months". But I think changing the guard rotation period to a year or more is probably much wiser, since it will slow down the curves on all the graphs in the above research papers.

I asked Aaron to make a graph comparing the success of an attacker who runs 10% of the guard capacity, in the "choose 3 guards and rotate them every 1-2 months" case and the "choose 1 guard and never rotate" case:

Vulnerability from natural vs intentional guard rotation

In the "3 guard" case (the blue line), the attacker's success rate rapidly grows to about 25%, and then it steadily grows to over 80% by the six month mark. The "1 guard" case (green line), on the other hand, grows to 10% (which makes sense since the adversary runs 10% of the guards), but then it levels off and grows only slowly as a function of network churn. By the six month mark, even this very large adversary's success rate is still under 25%.

So the good news is that by choosing better guard rotation parameters, we can almost entirely resolve the vulnerabilities described in these three papers. Great!

Or to phrase it more as a research question, once we get rid of this known issue, I'm curious how the new graphs over time will look, especially when we have a more sophisticated analysis of the "network observer" adversary. I bet there are some neat other attacks that we'll need to explore and resolve, but that are being masked by the poor guard parameter issue.

However, fixing the guard rotation period issue is alas not as simple as we might hope. The fundamental problem has to do with "load balancing": allocating traffic onto the Tor network so each relay is used the right amount. If Tor clients choose a guard and stick with it for a year or more, then old guards (relays that have been around and stable for a long time) will see a lot of use, and new guards will see very little use.

I wrote a separate blog post to provide background for this issue: "The lifecycle of a new relay". Imagine if the ramp-up period in the graph from that blog post were a year long! People would set up fast relays, they would get the Guard flag, and suddenly they'd see little to no traffic for months. We'd be throwing away easily half of the capacity volunteered by relays.

One approach to resolving the conflict would be for the directory authorities to track how much of the past n months each relay has had the Guard flag, and publish a fraction in the networkstatus consensus. Then we'd teach clients to rebalance their path selection choices so a relay that's been a Guard for only half of the past year only counts 50% as a guard in terms of using that relay in other positions in circuits. See Load balance right when we have higher guard rotation periods (ticket 9321) for more discussion, and see Raise our guard rotation period (ticket 8240) for earlier discussions.

Yet another challenge here is that sticking to the same guard for a year gives plenty of time for an attacker to identify the guard and attack it somehow. It's particularly easy to identify the guard(s) for hidden services currently (since as mentioned above, the adversary can control the rate at which hidden services make new circuits, simply by visiting the hidden service), but similar attacks can probably be made to work against normal Tor clients — see e.g. the http-level refresh tricks in How Much Anonymity does Network Latency Leak? This attack would effectively turn Tor into a network of one-hop proxies, to an attacker who can efficiently enumerate guards. That's not a complete attack, but it sure does make me nervous.

One possible direction for a fix is to a) isolate streams by browser tab, so all the requests from a given browser tab go to the same circuit, but different browser tabs get different circuits, and then b) stick to the same three-hop circuit (i.e. same guard, middle, and exit) for the lifetime of that session (browser tab). How to slow down guard enumeration attacks is a tough and complex topic, and it's too broad for this blog post, but I raise the issue here as a reminder of how interconnected anonymity attacks and defenses are. See Slow Guard Discovery of Hidden Services and Clients (ticket 9001) for more discussion.

Fix 3: The Tor code should better handle edge cases where you can't reach your guard briefly.

If a temporary network hiccup makes your guard unreachable, you switch to another one. But how long is it until you switch back? If the adversary's goal is to learn whether you ever go to a target website, then even a brief switch to a guard that the adversary can control or observe could be enough to mess up your anonymity.

Tor clients fetch a new networkstatus consensus every 2-4 hours, and they are willing to retry non-running guards if the new consensus says they're up again.

But I think there are a series of little bugs and edge cases where the Tor client abandons a guard more quickly than it should. For example, we mark a guard as failed if any of our circuit requests time out before finishing the handshake with the first hop. We should audit both the design and the source code with an eye towards identifying and resolving these issues.

We should also consider whether an adversary can *induce* congestion or resource exhaustion to cause a target user to switch away from her guard. Such an attack could work very nicely coupled with the guard enumeration attacks discussed above.

Most of these problems exist because in the early days we emphasized reachability ("make sure Tor works") over anonymity ("be very sure that your guard is gone before you try another one"). How should we handle this tradeoff between availability and anonymity: should you simply stop working if you've switched guards too many times recently? I imagine different users would choose different answers to that tradeoff, depending on their priorities. It sounds like we should make it easier for users to select "preserve my anonymity even if it means lower availability". But at the same time, we should remember the lessons from Anonymity Loves Company: Usability and the Network Effect about how letting users choose different settings can make them more distinguishable.

Fix 4: We need to make the network bigger.

We've been working hard in recent years to get more relay capacity. The result is a more than four-fold increase in network capacity since 2011:

Vulnerability from natural vs intentional guard rotation

As the network grows, an attacker with a given set of resources will have less success at the attacks described in this blog post. To put some numbers on it, while the relay adversary in Aaron's paper (who carries 660mbit/s of Tor traffic) represented 10% of the guard capacity in October 2012, that very same attacker would have been 20% of the guard capacity in October 2011. Today that attacker is about 5% of the guard capacity. Growing the size of the network translates directly into better defense against these attacks.

However, the analysis is more complex when it comes to a network adversary. Just adding more relays (and more relay capacity) doesn't always help. For example, adding more relay capacity in a part of the network that the adversary is already observing can actually *decrease* anonymity, because it increases the fraction the adversary can watch. We discussed many of these issues in the thread about turning funding into more exit relays. For more details about the relay distribution in the current Tor network, check out Compass, our tool to explore what fraction of relay capacity is run in each country or AS. Also check out Lunar's relay bubble graphs.

Yet another open research question in the field of anonymous communications is how the success rate of a network adversary changes as the Tor network changes. If we were to plot the success rate of the *relay* adversary using historical Tor network data over time, it's pretty clear that the success rate would be going down over time as the network grows. But what's the trend for the success rate of the network adversary over the past few years? Nobody knows. It could be going up or down. And even if it is going down, it could be going down quickly or slowly.

(Read more in Research problem: measuring the safety of the Tor network where I describe some of these issues in more detail.)

Recent papers have gone through enormous effort to get one, very approximate, snapshot of the Internet's topology. Doing that effort retroactively and over long and dynamic time periods seems even more difficult and more likely to introduce errors.

It may be that the realities of Internet topology centralization make it so that there are fundamental limits on how much safety Tor users can have in a given network location. On the other hand, researchers like Aaron Johnson are optimistic that "network topology aware" path selection can improve Tor's protection against this style of attack. Much work remains.

Fix 5: We should assign the guard flag more intelligently.

In point 1 above I talked about why we need to raise the bar for becoming a guard, so all guards can provide adequate bandwidth. On the other hand, having fewer guards is directly at odds with point 4 above.

My original guard rotation parameters blog post ends with this question: what algorithm should we use to assign Guard flags such that a) we assign the flag to as many relays as possible, yet b) we minimize the chance that Alice will use the adversary's node as a guard?

We should use historical Tor network data to pick good values for the parameters that decide which relays become guards. This remains a great thesis topic if somebody wants to pick it up.

Part four: Other thoughts

What does all of this discussion mean for the rest of Tor? I'll close by trying to tie this blog post to the broader Tor world.

First, all three of these papers come from the Tor research community, and it's great that Tor gets such attention. We get this attention because we put so much effort into making it easy for researchers to analyze Tor: we've worked closely with these authors to help them understand Tor and focus on the most pressing research problems.

In addition, don't be fooled into thinking that these attacks only apply to Tor: using Tor is still better than using any other tool, at least in quite a few of these scenarios. That said, some other attacks in the research literature might be even easier than the attacks discussed here. These are fast-moving times for anonymity research. "Maybe you shouldn't use the Internet then" is still the best advice for some people.

Second, the Tails live CD doesn't use persistent guards. That's really bad I think, assuming the Tails users have persistent behavior (which basically all users do). See their ticket 5462.

Third, the network-level adversaries rely on being able to recognize Tor flows. Does that argue that using pluggable transports, with bridges, might change the equation if it stops the attacker from recognizing Tor users?

Fourth, I should clarify that I don't think any of these large relay-level adversaries actually exist, except as a succession of researchers showing that it can be done. (GCHQ apparently ran a small number of relays a while ago, but not in a volume or duration that would have enabled this attack.) Whereas I *do* think that the network-level attackers exist, since they already invested in being able to surveil the Internet for other reasons. So I think it's great that Aaron's paper presents the dual risks of relay adversaries and link adversaries, since most of the time when people are worrying about one of them they're forgetting the other one.

Fifth, there are still some ways to game the bandwidth authority measurements (here's the spec spec) into giving you more than your fair share of traffic. Ideally we'd adapt a design like EigenSpeed so it can measure fast relays both robustly and accurately. This question also remains a great thesis topic.

And finally, as everybody wants to know: was this attack how "they" busted recent hidden services (Freedom Hosting, Silk Road, the attacks described in the latest Guardian article)? The answer is apparently no in each case, which means the techniques they *did* use were even *lower* hanging fruit. The lesson? Security is hard, and you have to get it right at many different levels.

The lifecycle of a new relay

Many people set up new fast relays and then wonder why their bandwidth is not fully loaded instantly. In this post I'll walk you through the lifecycle of a new fast non-exit relay, since Tor's bandwidth estimation and load balancing has gotten much more complicated in recent years. I should emphasize that the descriptions here are in part anecdotal — at the end of the post I ask some research questions that will help us make better sense of what's going on.

I hope this summary will be useful for relay operators. It also provides background for understanding some of the anonymity analysis research papers that people have been asking me about lately. In an upcoming blog post, I'll explain why we need to raise the guard rotation period (and what that means) to improve Tor's anonymity. [Edit: here is that blog post]

A new relay, assuming it is reliable and has plenty of bandwidth, goes through four phases: the unmeasured phase (days 0-3) where it gets roughly no use, the remote-measurement phase (days 3-8) where load starts to increase, the ramp-up guard phase (days 8-68) where load counterintuitively drops and then rises higher, and the steady-state guard phase (days 68+).


The phases of a new relay

Phase one: unmeasured (days 0-3).

When your relay first starts, it does a bandwidth self-test: it builds four circuits into the Tor network and back to itself, and then sends 125KB over each circuit. This step bootstraps Tor's passive bandwidth measurement system, which estimates your bandwidth as the largest burst you've done over a 10 second period. So if all goes well, your first self-measurement is 4*125K/10 = 50KB/s. Your relay publishes this number in your relay descriptor.

The directory authorities list your relay in the network consensus, and clients get good performance (and balance load across the network) by choosing relays proportional to the bandwidth number listed in the consensus.

Originally, the directory authorities would just use whatever bandwidth estimate you claimed in your relay descriptor. As you can imagine, that approach made it cheap for even a small adversary to attract a lot of traffic by simply lying. In 2009, Mike Perry deployed the "bandwidth authority" scripts, where a group of fast computers around the Internet (called bwauths) do active measurements of each relay, and the directory authorities adjust the consensus bandwidth up or down depending on how the relay compares to other relays that advertise similar speeds. (Technically, we call the consensus number a "weight" rather than a bandwidth, since it's all about how your relay's number compares to the other numbers, and once we start adjusting them they aren't really bandwidths anymore.)

The bwauth approach isn't ungameable, but it's a lot better than the old design. Earlier this year we plugged another vulnerability by capping your consensus weight to 20KB until a threshold of bwauths have an opinion about your relay — otherwise there was a several-day window where we would use your claimed value because we didn't have anything better to use.

So that's phase one: your new relay gets basically no use for the first few days of its life because of the low 20KB cap, while it waits for a threshold of bwauths to measure it.

Phase two: remote measurement (days 3-8).

Remember how I said the bwauths adjust your consensus weight based on how you compare to similarly-sized relays? At the beginning of this phase your relay hasn't seen much traffic, so your peers are the other relays who haven't seen (or can't handle) much traffic. Over time though, a few clients will build circuits through your relay and push some traffic, and the passive bandwidth measurement will provide a new larger estimate. Now the bwauths will compare you to your new (faster) peers, giving you a larger consensus weight, thus driving more clients to use you, in turn raising your bandwidth estimate, and so on.

Tor clients generally make three-hop circuits (that is, paths that go through three relays). The first position in the path, called the guard relay, is special because it helps protect against a certain anonymity-breaking attack. Here's the attack: if you keep picking new paths at random, and the adversary runs a few relays, then over time the chance drops to zero that *every single path you've made* is safe from the adversary. The defense is to choose a small number of relays (called guards) and always use one of them for your first hop — either you chose wrong, and one of your guards is run by the adversary and you lose on many of your paths; or you chose right and all of your paths are safe. Read the Guard FAQ for more details.

Only stable and reliable relays can be used as guards, so no clients are willing to use your brand new relay as their first hop. And since in this story you chose to set up a non-exit relay (so you won't be the one actually making connections to external services like websites), no clients will use it as their third hop either. That means all of your relay's traffic is coming from being the second hop in circuits.

So that's phase two: once the bwauths have measured you and the directory authorities lift the 20KB cap, you'll attract more and more traffic, but it will still be limited because you'll only ever be a middle hop.

Phase three: Ramping up as a guard relay (days 8-68).

This is the point where I should introduce consensus flags. Directory authorities assign the Guard flag to relays based on three characteristics: "bandwidth" (they need to have a large enough consensus weight), "weighted fractional uptime" (they need to be working most of the time), and "time known" (to make attacks more expensive, we don't want to give the Guard flag to relays that haven't been around a while first). This last characteristic is most relevant here: on today's Tor network, you're first eligible for the Guard flag on day eight.

Clients will only be willing to pick you for their first hop if you have the "Guard" flag. But here's the catch: once you get the Guard flag, all the rest of the clients back off from using you for their middle hops, because when they see the Guard flag, they assume that you have plenty of load already from clients using you as their first hop. Now, that assumption will become true in the steady-state (that is, once enough clients have chosen you as their guard node), but counterintuitively, as soon as you get the Guard flag you'll see a dip in traffic.

Why do clients avoid using relays with the Guard flag for their middle hop? Clients look at the scarcity of guard capacity, and the scarcity of exit capacity, and proportionally avoid using relays for positions in the path that aren't scarce. That way we allocate available resources best: relays with the Exit flag are used mostly for exiting when they're scarce, and relays with the Guard flag are used mostly for entering when they're scarce.

It isn't optimal to allow this temporary dip in traffic (since we're not taking advantage of resources that you're trying to contribute), but it's a short period of time overall: clients rotate their guards nodes every 4-8 weeks, so pretty soon some of them will rotate onto your relay.

To be clear, there are two reasons why we have clients rotate their guard relays, and the reasons are two sides of the same coin: first is the above issue where new guards wouldn't see much use (since only new clients, picking their guards for the first time, would use a new guard), and second is that old established guards would accrue an ever-growing load since they'd have the traffic from all the clients that ever picked them as their guard.

One of the reasons for this blog post is to give you background so when I later explain why we need to extend the guard rotation period to many months, you'll understand why we can't just crank up the number without also changing some other parts of the system to keep up. Stay tuned for more details there, or if you don't want to wait you can read the original research questions and the followup research papers by Elahi et al and Johnson et al.

Phase four: Steady-state guard relay (days 68+).

Once your relay has been a Guard for the full guard rotation period (up to 8 weeks in Tor 0.1.1.11-alpha through 0.2.4.11-alpha, and up to 12 weeks in Tor 0.2.4.12-alpha and later), it should reach steady-state where the number of clients dropping it from their guard list balances the number of clients adding it to their guard list.

Research question: what do these phases look like with real-world data?

All of the above explanations, including the graphs, are just based on anecdotes from relay operators and from ad hoc examination of consensus weights.

Here's a great class project or thesis topic: using our publically available Tor network metrics data, track the growth pattern of consensus weights and bandwidth load for new relays. Do they match the phases I've described here? Are the changes inside a given phase linear like in the graphs above, or do they follow some other curve?

Are there trends over time, e.g. it used to take less time to ramp up? How long are the phases in reality — for example, does it really take three days before the bwauths have a measurement for new relays?

How does the scarcity of Guard or Exit capacity influence the curves or trends? For example, when guard capacity is less scarce, we expect the traffic dip at the beginning of phase three to be less pronounced.

How different were things before we added the 20KB cap in the first phase, or before we did remote measurements at all?

Are the phases, trends, or curves different for exit relays than for non-exit relays?

The "steady-state" phase assumes a constant number of Tor users: in situations where many new users appear (like the botnet invasion in August 2013), current guards will get unbalanced. How quickly and smoothly does the network rebalance as clients shift guards after that?

Hidden Services need some love

Hidden Services are in a peculiar situation. While they see a loyal fan-base, there are no dedicated Tor developers to take care of them. This results in a big pile of features that need to be researched, implemented and deployed to make Hidden Services more secure and effective.

The purpose of this blog post is threefold:

  1. Introduce Hidden Service operators to various shortcomings of the Hidden Service architecture.
  2. Introduce researchers to various research questions regarding Hidden Services.
  3. Introduce developers to the plethora of coding tasks left to be done in the hidden Service ecosystem.


Note that not every idea listed in the blog post is going to turn out to be a great idea. This post is more of a brain-dump than a solid fully-analyzed agenda.

In any case, let's get down to the issues:




Hidden Service Scaling


The current Hidden Services architecture does not scale well. Ideally, big websites should have the option to completely migrate to Tor Hidden Services, but this is not possible with their current architecture.

One of the main problems with a busy Hidden Service is that its Introduction Points will get hammered by clients. Since Introduction Points are regular Tor relays, they are not intended to handle such load.

Therefore, one of the first steps for improving Hidden Services scalability is increasing the durability of its Introduction Points. Currently, a Hidden Service selects the number of its Introduction Points (between one and ten) based on a self-estimation of its own popularity. Whether the formula currently used is the best such formula is an open research question.

Another problem with Hidden Services is the lack of load balancing options. While you can load-balance a Hidden Service using TCP/HTTP load balancers (like HAProxy), there is no load-balancing option similar to DNS round-robin, where load balancing happens by sending clients to different server IP addresses. Such load-balancing could be achieved by allowing a Hidden Service to have multiple "subservices". Such an architecture, although appealing, introduces multiple problems, like the intercommunication between subservices, where the long-term keypair is stored, how introduction points are assigned, etc.


Defense against Denial of Service of Introduction Points


The adversarial version of the previous section involves attackers intentionally hammering the Introduction Points of a Hidden Service to make it unreachable by honest clients. This means that an attacker can temporarily bring down a Hidden Service by DoSing a small number of Tor relays.

To defend against such attacks, Syverson and Øverlier introduced Valet nodes in their PETS 2006 paper: "Valet Services: Improving Hidden Servers with a Personal Touch". Valet nodes stand in front of Introduction Points and act as a protection layer. This allows Hidden Services to maintain a limited number of Introduction Points, but many more contact points, without clients learning the actual addresses of the Introduction Points.

Valet nodes are not implemented yet, mainly because of the big implementation and deployment effort they require.


Key Length


The long-term keypair of a Hidden Service is an RSA-1024 keypair which nowadays is considered weak. This means that in the future, Hidden Services will need to migrate to a different keysize and/or asymmetric cryptographic algorithm.

A side effect of such migration is that Hidden Services will get a different onion address, which might be troublesome for Hidden Services that have a well-established onion address. To make the transition smoother, Hidden Services should be able to use both old and new keypairs for a while to be able to point their clients to the new address.

Unfortunately, while design work has started on strengthening some parts of Tor's cryptography, there are no proposals on improving the cryptography of Hidden Services yet.


Attacks by Hidden Service Directory Servers


Hidden Services upload their descriptor to Tor nodes called Hidden Service Directory Servers (HSDirs). Clients then fetch that descriptor and use it to connect to the Hidden Service.

In the current system, HSDirs are in an interesting position which allows them to perform the following actions:

  • Learn the .onion address of a Hidden Service and connect to it
  • Evaluate the popularity of a Hidden Service by tracking the number of clients who do a lookup for that Hidden Service
  • Refuse to answer a client, and if enough HSDirs do this then the Hidden Service is temporarily unreachable

These scenarios are explored in the upcoming IEEE S&P paper titled "Trawling for Tor Hidden Services: Detection, Measurement, Deanonymization" from Alex Biryukov, Ivan Pustogarov and Ralf-Philipp Weinmann. Be sure to check it out (once they publish it)!

Let's look at some suggested fixes for the attacks that Hidden Service Directory Servers can perform:


Defences against enumeration of onion addresses

Hidden Services use a hash ring to choose which HSDirs will host their descriptor; this means that HSDirs can just wait to get picked by Hidden Services and then collect their descriptors and onion addresses. Also, since the hash ring is rotating, HSDirs get new Hidden Service descriptors in every rotation period.

One possible solution to this issue would be to append a symmetric key to the onion address and use it to encrypt the descriptor before sending it to HSDirs (similar to how descriptor-cookie authentication works currently). A client that knows the onion address can decrypt the descriptor, but an HSDir who doesn't know the onion address can't derive the Hidden Service name. The drawback of this scheme is that the size of onion addresses will increase without increasing the security of their self-authentication property. Furthermore, HSDirs will still be able to extract the Hidden Service public key from the descriptor, which allows HSDirs to track the descriptors of specific Hidden Services.

A different solution was proposed by Robert Ransom:

Robert's scheme uses the long-term keypair of a Hidden Service to derive (in a one-way fashion) a second keypair, which is used to encrypt and sign the descriptor that is uploaded to the HSDirs. This construction allows the HSDir, without knowing the long-term keypair of the Hidden Service or the contents of its descriptor, to validate that the entity who uploaded the descriptor had possession of the long-term private key of the Hidden Service. A client who knows the long-term public key of the Hidden Service can fetch the descriptor from the HSDir and verify that it was created by the Hidden Service itself. See the relevant trac ticket for a more robust analysis of the idea.

Robert's idea increases the size of onion addresses, but also makes them more resistant to impersonation attacks (the current 80-bit security of onion addresses does not inspire confidence against impresonation attacks). Furthermore, his idea does not allow HSDirs to track Hidden Service descriptors across time.

While Robert's scheme is fairly straightforward, a proper security evaluation is in order and a Tor proposal needs to be written. For extra fun, his idea requires the long-term keypair of the Hidden Service to use a discrete-log cryptosystem, which means that a keypair migration will be needed if we want to proceed with this plan.


Block tracking of popularity of Hidden Services

HSDirs can track the number of users who do a lookup for a Hidden Service, thereby learning how popular they are. We can make it harder for HSDirs to track the popularity of a Hidden Service, by utilizing a Private Information Retrieval (PIR) protocol for Hidden Service descriptor fetches. Of course, this won't stop the Introduction Points of a Hidden Service from doing the tracking, but since the Introduction Points were picked by the Hidden Service itself, the threat is smaller.

If we wanted to block Introduction Points from tracking the popularity of Hidden Services, we could attempt hiding the identity of the Hidden Service from its Introduction Points by using a cookie scheme, similar to how the Rendezvous is currently done, or by using Robert's keypair derivation trick and signing the introduction establishment with the new keypair. A careful security evaluation of these ideas is required.


Make it harder to become an adversarial HSDir

Because of the security implications that HSDirs have for a Hidden Services, we started working on making it harder for a Tor relay to become an HSDir node.

Also, currently, an adversary can predict the identity keys it will need in the future to target a specific Hidden Service. We started thinking of ways to avoid this attack.


Performance improvements


Hidden services are slooooowwww and we don't even understand why. They might be slow because of the expensive setup process of creating a Hidden Service circuit, or because Hidden Service circuits have 6 hops, or because of something else. Many suggestions have been proposed to reduce the latency of Hidden Services, ranging from Hidden Service protocol hacks to Javascript hacks, and to radically changing how the Hidden Service circuit is formed.

Let's investigate some of these proposals:


Reducing Hidden Service Circuit Setup complexity

During PETS 2007 Syverson and Øverlier presented "Improving Efficiency and Simplicity of Tor circuit establishment and hidden services" which simplifies Hidden Service circuit establishmentby eliminating the need of a separate rendezvous connection.

They noticed that by using Valet nodes, the concept of Rendezvous Points is redundant and that a Hidden Service circuit can be formed by just using Valet nodes and Introduction Points. Karsten Loesing wrote a Tor proposal for a variant of this idea.

The reason this scheme is not implemented is that the security trade-offs introduced are not well understood, and there are also some technical obstacles (like the fact that sharing of circuits between multiple clients is not currently supported).


Analyze Hidden Service Circuit Establishment Timing With Torperf

Establishing a connection to a hidden service currently involves two Tor relays, the introduction and rendezvous point, and 10 more relays distributed over four circuits to connect to them. No one has really researched how much time Tor spends in each step of that complicated process. It wouldn't be surprising if a large amount of time is spent in an unexpected part of the process.

To investigate this properly, one should use Torperf to analyze the timing delta between the steps of the process. Unfortunately, Torperf uses controller events to distinguish between Tor protocol phases but not all steps of the Hidden Service circuit setup have controller events assigned to them. Implementing this involves adding the control port triggers to the Tor codebase, running Torperf and then collecting and analyzing the results.


Hidden Services should reuse old Introduction Points

Currently, Hidden Services stop establishing circuits to old Introduction Points after they break. While this behavior makes sense, it means that clients who have old hidden service descriptors will keep introducing themselves to the wrong introduction points. This is especially painful in roaming situations where users frequently change networks (and lose existing circuits).

A solution to this would be for Hidden Services to reestablish failed circuits to old Introduction Points (if the circuits were destroyed because of network failures). We should explore the security consequences of such a move, and also what's the exact time period that Introduction Points are considered "old" but still "worth reestablishing circuits to".


Encrypted Services


Encrypted Services is the correct way of implementing the now-defunct Exit Enclaves.

Encrypted Services allow you to run a non-anonymous Hidden Service where the server-side rendezvous circuit is only one hop. This makes sense in scenarios where the Hidden Service doesn't care about its anonymity, but still wants to allow its clients to access it anonymously (and with all the other features that self-authenticating names provide). See Roger's original proposal for more use cases and information.

On this topic, Robert Ransom proposed to implement Encrypted Services as a program separate from Tor, since it serves a quite different threat model. Furthermore, if done this way, its users won't overload the Tor network and it will also allow greater versatility and easier deployment.


Human Memorable onion addresses


Zooko's triangle characterizes onion addresses as secure and global, but not human memorable. By now a couple of schemes have been proposed to make hidden services addresses memorable, but for various reasons none of them has been particularly successful.




These were just some of the things that must be done in the Hidden Services realm. If you are interested in helping around, please read the links and trac tickets, and hit us back with proposals, patches and suggestions. Use the [tor-dev] mailing list, or our IRC channels for development-related communication.

Finally, note that this blog post only touched issues that involve Tor's codebase or the Hidden Service protocol and its cryptography. However, if we want Hidden Services to be truly successful and influential, it's also important to build a vibrant ecosystem around them. For example, we need privacy-preserving archiving systems and search engines (and technologies and rules on how they should work), we need easy-to-use publishing platforms, Internet service daemons and protocols optimized for high-latency connections, anonymous file sharing, chat systems and social networks.

Thanks go to Roger, Robert and other people for the helpful comments and suggestions on this blog post.

PS: Don't forget to use anonbib to find and download any research papers mentioned in this blog post.

Trip report, ACM CCS/WPES

In October I attended WPES and the first day of CCS. There are a bunch of new Tor-related research papers:

  • "Changing of the Guards: A Framework for Understanding and Improving Entry Guard Selection in Tor". Basically Tariq has written a suite of tools for analyzing how likely an adversary is to be able to see a Tor user's circuits given various values for guard selection and guard rotation. Early results include "three guards is probably the wrong number" and "we probably shouldn't rotate guards so often". Later results I hope will include "here's an algorithm for assigning the Guard flag to relays that makes users safer".
  • "Torchestra: Reducing interactive traffic delays over Tor". This paper suggests having two parallel TLS connections between each pair of relays — one for the loud circuits and one for the quiet ones. I've had a series of debates with Rob Jansen over whether this should really help, or if it's just shifting the round-robining to a deeper level (e.g. kernel-level buffers). My current opinion is that it should really help, not because it writes important bytes out faster, but because the other side can *read* important bytes faster — in the current state a relay can't predict which incoming bytes are going to be high-priority, but when high-priority bytes come on a separate TCP connection, it's easy to tell.
  • "Enhancing Tor's Performance using Real-time Traffic Classification". I haven't read it in detail yet, but it looks at using machine learning to classify circuits depending on what sort of traffic they're carrying. This direction is worthwhile, but it skips over the question that Rob and I are currently wrestling with, which is "ok, so you've decided to give lower priority to a circuit. What should you actually do to make that circuit put less load on the network?" See also Rob's Usenix Security paper: http://freehaven.net/anonbib/#throttling-sec12
  • "SkypeMorph: Protocol Obfuscation for Tor Bridges". The idea is to use the actual Skype program to make a (TCP) video call between user and bridge, and then drop the call and start up your own UDP traffic flows that mimic Skype's UDP flows in terms of size and timing. I'm not clear that trying to look like Skype is a game we can win (especially with DPI-level adversaries already playing the arms race to detect and block 'real' Skype, and with the wasted bandwidth that comes with pretending to be a Skype video call), but I think we can get a lot of mileage out of a Tor pluggable transport that carries Tor traffic over a UDP flow — especially with recent rumors of a Syrian ISP throttling all TCP flows.
  • "StegoTorus: A Camouflage Proxy for the Tor Anonymity System". Stegotorus is an obfsproxy fork with two goals: first, chop up Tor traffic flows so you can't recognize Tor traffic just by looking for more 586-byte TCP packets than usual; and second, transport those chopped-up flows over a variety of steg modules, to hide the traffic in protocols that the adversary is unwilling to block (as opposed to obfsproxy's goal, which is to make a flow that's hard enough to DPI for that the adversary has to choose between blocking all unrecognized protocols or letting the flows through). Unfortunately, there aren't any great steg modules yet; rather, it aims to be a framework where if you *did* have a great steg module, you could just pop it in.
  • "CensorSpoofer: Asymmetric Communication using IP Spoofing for Censorship-Resistant Web Browsing". It's designed for censored users who mostly consume bytes rather than produce them. This design also uses a UDP stream to deliver the bytes, but they spoof the stream as coming from a plausible-looking voip client rather than from the real source. Then they need some low-latency low-bandwidth way (e.g. instant messaging) to get the upstream packets to the server.
  • There's also "Touching from a Distance: Website Fingerprinting Attacks and Defenses". But I confess I haven't looked at it yet. Website fingerprinting remains a huge and open issue that needs more attention.

This recent variety of pluggable-transport designs and research papers is fantastic. It also makes me realize that somebody should put some effort into identifying the various components that a Tor transport system needs, and which papers provide which components. For example, it sounds like SkypeMorph and CensorSpoofer could share the same details for the link encryption and flow control for their UDP stream, rather than inventing two separately. Similarly, the "small upstream channel" requirement from CensorSpoofer reminds me of the similar goal that David's Flashproxy design faces. I see that Tariq and Ian have a new tech report out that gives that a start.

Ultrasurf: the definitive review

In the summer of 2011, I spent a few months learning how to effectively reverse engineer Windows software. I'm still learning and while I have a lifetime of learning to do on the topic, I chose to audit Ultrasurf as a challenge. This research was performed as a labor of love and it was funded work. My interest in reverse engineering Ultrasurf comes entirely because I have seen people promoting it without also offering evidence that it is safe. Additionally, a few people had asked me what I thought of the software and in order to form an opinion, I decided to dig deeper.

Ultrasurf is software produced by the UltraReach company for censorship circumvention, privacy, security and anonymity. Unfortunately for them, I found their claims to be overstated and I found a number of serious problems with Ultrasurf.

My report is available for download from the following link: https://media.torproject.org/misc/2012-04-16-ultrasurf-analysis.pdf

Most of my research was done while traveling in Brazil, Canada, Germany, and very small amount of it was performed in the US. Additionally, a number of interesting data points in my research paper came from interception devices in Syria. As of early April 2012, an independent tester confirmed many of my findings from China; the versions of Ultrasurf tested did directly connect to blocked addresses and did not in-fact work at all. Newer versions appear to have different, not yet blocked, addresses baked into the program.

I believe that coordinated disclosure is reasonable in most cases and I ensured that Ultrasurf was notified long before the publication of this blog post. I had a face to face meeting in early December of 2011 to discuss my findings with the lead developer of Ultrasurf and to give them time to fix the problems that I discovered. Ultrasurf updated their website to change a number of their security, privacy and anonymity claims; they did not actually remove all of the bogus claims, merely the most egregious statements. Our meeting was overall quite positive and in fact led me to write notes that may become a second paper.

However, for various reasons, I've had to sit silently on this report for nearly four full months after our December meeting. I believe it is important to ensure that the issues discovered and discussed in my paper are resolved and that users are not kept in harm's way. I have serious concerns about ongoing security issues for the users of Ultrasurf and that is my primary reason for wishing to perform and release this research for all to see.

Here's the abstract of the paper:
Ultrasurf is a proxy-based program promoted for Internet censorship circumvention. This report gives a technical analysis of the Ultrasurf software and network. We present the results of reverse engineering the Ultrasurf client program, give an in-depth study of the known Ultrasurf network, especially those portions that interface in some way with the client or the Internet, and discuss network signatures that would allow an adversary to detect its use on a network. We cover client bootstrapping methods, censorship and censorship resistance, anonymity, user tagging by Ultrasurf and other parties, cryptographic internals and other previously unknown or undiscovered details about the Ultrasurf client and the Ultrasurf network. We find that it is possible to monitor and block the use of Ultrasurf using commercial off-the-shelf software. In particular, BlueCoat sells software and hardware solutions with such capabilities that have been deployed in Syria and other countries.

The vulnerabilities presented in this paper are not merely theoretical in nature; they may present life-threatening danger in hostile situations. We recommend against the use of Ultrasurf for anonymity, security, privacy and Internet censorship circumvention.

The main substance of the paper takes the time to refute nearly all of the claims that UltraReach makes on their website about their software Ultrasurf:
This paper addresses the following claims by UltraReach and other Ultrasurf advocates about the Ultrasurf client and Ultrasurf network:

  1. “Ultrasurf enables users to browse any website freely” — refuted in Section 3.1
  2. “employs a decoying mechanism to thwart any tracing effort of its communication with its infrastructure.” — refuted in Section 5.13
  3. “Protect your privacy online with anonymous surfing and browsing. Ultrasurf hides your IP address, clears
    browsing history, cookies, and more.” — refuted in Section 6.2 and Section 6.3.

  4. “change IP addresses a million times an hour” — refuted in Section 6.1
  5. “Untraceable” — refuted in Section 6.10
  6. “Unblockable: Client uses wide array of discovery mechanisms to find an available proxy server and, when necessary, to switch/hop to avoid tracking/blocking” — refuted in Section 6.8
  7. “Invisible: Leaves no traces on the user’s computer, and its traffic is indistinguishable from normal access to HTTPS sites” — refuted in Section 5.12
  8. “Anonymous: No registration is requires [sic], and no personally identifying information collected” — refuted in Section 6.10
  9. “Tamperproof: Using privately-signed SSL certificates which dont depend on external, potentially compromised CAs (thus preempting MITM attacks), Ultrasurf proactively detects attempts by censors to reverse-engineer, sabotage, or otherwise interfere in the secure operation of the tool” — refuted in Section 5.8.

We conclude that each of these claims is false, incorrect, or misleading.

The issues involved in the writing, discussion and publication of this report are the stuff of movies. It has taken ages to publish this report and attempts at coordinated disclosure have been time consuming, largely fruitless and extremely frustrating. While some of the issues I have identified have been fixed, to the best of my knowledge the most important issues, such as a lack of forward secrecy, remain serious outstanding security issues. Ultrasurf often boasts of their decade long fight against censorship and while I respect the spirit of their efforts, I have a hard time respecting the technical implementation. I'm afraid that they've not had forward secrecy in their cryptographic protocol for that entire decade. Ultrasurf also often boasts of being untraceable when in fact they admitted to logging and disclosing user identifying logs to law enforcement when the data was requested. These kinds of security failures, both social and technical, are simply negligent and it means that users have been and are likely still in harm's way.

I firmly believe that Ultrasurf must publish their full technical specifications, peer review their designs of both obfuscation and cryptography, open their source code for the world to review and they must absolutely discontinue all data retention without exception.

I hope you'll enjoy the research presented in the paper and that it will help everyone to move towards building a more secure set of options for users.

Update:
UltraReach/Ultrasurf have released a response document and a response page that confirms a number of my claims, side steps a large swath of them and then attacks me, Tor and others for the report. They specifically claim that what is true in my paper is for older versions of Ultrasurf. They do not disclose which versions or when the fixes were released. This is a typical vendor tactic considering that they pressured me not to release the report until they felt they were given enough time to fix the issues involved. They also believe that I claim that Ultrasurf was broken but at no time did I ever claim it was broken; rather, I said it has problems. The claims they made and make do not live up to the implementation of policies or technical capabilities. This I think is quite reasonable because their claims were, frankly, entirely unreasonable.

I put a great deal of time and effort into disclosing these report findings to Ultrasurf - both what would be considered responsible and coordinated - it's too bad that they've decided to ignore most of the findings and to attack me over the undefendable issues.

Another Update: Collin Anderson has written up his view of the disclosure process. He is an independently involved third party that attempted to mediate our disclosure, solutions and a reasonable time frame for all parties involved.

Syndicate content Syndicate content