Top changes in Tor since the 2004 design paper (Part 1)

by nickm | October 4, 2012

The main academic reference for Tor is "Tor: The Second-Generation Onion Router" by Dingledine, Mathewson, and Syverson. But that paper was published back in 2004, and Tor has evolved since then. So Steven Murdoch and Nick Mathewson are currently preparing an updated version of the Tor design paper, to include new design changes and research results concerning Tor over the last 8 years.

In this series of posts, we (Steven and Nick) will try to summarize the most interesting or significant changes to Tor's design since the publication of the original paper. We're only going to cover the stuff we think is most interesting, and we'll aim to do so in an interesting way.

We think this will be a three part series. In this first part, we'll cover the evolution of Tor's directory system, and some performance improvements in Tor's circuit creation and cell scheduling logic.

1. Node discovery and the directory protocol

Since the earliest versions of Tor, we knew that node discovery would require a better implementation than we had. There are a few key issues that any anonymity network's node discovery system needs to solve.

Every client needs to be selecting nodes from the same probability distribution, or an adversary could be able to exploit differences in client knowledge. In the worst case, if an adversary can cause one client to know about an exit node that no other client uses, the adversary can know that all traffic leaving that exit is coming from the targeted client. But even in more mild cases, where (say) every client knows every node with P=90%, an adversary can use this information to attack users.

While there has been some work on quantifying these so-called "epistemic attacks," we're proceeding with the conservative assumption that clients using separate sets of nodes are likely to be partitionable from one other, and so the set of nodes used by all clients needs to be uniform.

The earliest versions of Tor solved this problem with a "Directory" object – each server generated a signed "router descriptor", and uploaded it to one of a small set (three) of "directory authorities". Each of these authorities generated a signed concatenated list of router descriptors, and served that list to clients over HTTP.

This system had several notable problems:

  • Clients needed to download the same descriptors over and over, whether they needed them or not.
  • Each client would believe whichever directory authority it had spoken to most recently: rather than providing distributed trust, each authority was fully trusted in its own right, and any one misbehaving authority could completely compromise all the clients that talked to it.
  • To the extent that directory authorities disagreed, they created partitions in client knowledge, which could in the worst case allow an adversary to partition clients based on which authority's directory each client had downloaded most recently.
  • The load on the authorities promised to grow excessive, as every client contacted every authority.
  • The contents of the directory were sent unencrypted, which made them trivially easy to fingerprint on the wire.

Early changes in the Version 1 Directory System

Our earliest changes focused on improving scalability rather than improving the trust model. We began by having each authority publish two documents, rather than one: a directory that contained all the router descriptors, and a "network status" document that was a list of which descriptors were up and which were down (Tor 0.0.8pre1, in Jul 2004). Clients could download the latter more frequently to avoid using down servers, and refresh the former only periodically.

We also added a caching feature in Tor 0.0.8pre1, where nodes could act as directory caches. With this feature, once a client had a directory, it no longer needed to contact the directory authorities every time it wanted to update it, but rather could contact a cache instead.

The Version 2 Directory System (deprecated)

In Tor 0.1.1.8-alpha (Oct 2005), we took our first shot at solving the trust model. Now, we had each authority sign a more complete network status statement, including a list of all the nodes that it believed should be in the network, a digest of each node's public key, and a digest of each node's router descriptor. Clients would download one of these documents signed by each authority, and then compute, based on all the documents they had, which collection of nodes represented the consensus view of all the authorities.

To get router descriptors, clients would then contact one or more caches, and ask for the descriptors they believed represented the consensus view of all the authorities.

This approach meant that a single rogue directory authority could no longer completely control any client's view of the network. Clients became more fragmented, however, since instead of falling into one of N possible groups based on which authority they contacted most recently, they fell into one of MN groups where N was the number of authorities and M was the number of recently valid opinions from each authority.

Around this time we also had authorities begin assigning flags to nodes, so that in addition to recording "up" or "down" for each node, authorities could also declare whether nodes were fast, stable, likely to be good guard nodes, and so forth.

All of the improvements so far were oriented toward saving bandwidth at the server side: we figured that clients had plenty of bandwidth, and we wanted to avoid overloading the authorities and caches. But if we wanted to add more directory authorities (a majority of 5 is still an uncomfortably small number), bootstrapping clients would have to fetch one more network status for every new authority. By early 2008, each status document listed 2500 relay summaries and came in around 175KB compressed, meaning you needed 875KB of status docs when starting up, and then another megabyte of descriptors after that. And we couldn't add more authorities without making the problem even worse.

Version 3: Consensus and Directory Voting

To solve the problems with the v2 directory protocol, Tor 0.2.0.3-alpha (Jul 2007) introduced a directory voting system, where the authorities themselves would exchange vote documents periodically (currently once per hour), compute a consensus document based on everyone's votes, and all sign the consensus.

Now clients only need to download a single signed consensus document periodically, and check that it is signed by a sufficiently large fraction of the authorities that the client knows about. This gives clients a uniform view of the network, makes it harder still for a small group of corrupt authorities to attack a client, and limits the number of documents they need to download.

The voting algorithm is ad hoc, and is by no means the state of the art in byzantine fault tolerance. Our approach to failures to reach a consensus (which have been mercifully infrequent) is to find what's wrong and fix it manually.

Saving bytes with microdescriptors

Now that the consensus algorithm finally matched what we had in mind when we wrote the first Tor paper, it was time to address the protocol's verbosity.

Out of all the data in a typical 1500-byte server descriptor, a client really only needs to know what ports it supports exiting to, its current RSA1024 onion key, and other information that is fully redundant with its listing in the consensus network status document.

One crucial observation is that signatures on the router descriptors themselves don't actually help clients: if there were enough hostile authorities to successfully convince the clients to use a descriptor that a router didn't actually sign, they could as easily convince the clients to use a descriptor signed with a phony identity key.

This observation let us move (in Tor 0.2.3.1-alpha, May 2011) to a design where the authorities, as part of their voting process, create an abbreviated version of each descriptor they recommend. Currently, these contain only a short summary of the router's exit policy, and the router's current onion key. Clients now download these abbreviated "microdescriptors", which cuts the information downloaded each node by about 75%. Further, because the data here change relatively infrequently, it cuts down the frequency with which clients fetch new information about each router at all.

Tunneling directory connections over Tor

In 0.1.2.5-alpha (Jan 2007), we added support by default for clients to download all directory documents over HTTP over Tor, rather than by contacting directories and caches over unencrypted HTTP. This change helps clients resist fingerprinting.

Because clients aren't using Tor for anonymity on directory connections, they build single-hop circuits. We use HTTP over a one hop Tor circuit, rather than plain old HTTPS, so that clients can use the same TLS connection both for a directory fetch and for other Tor traffic.

2. Security improvements for hidden services

Decentralized hidden-service directory system

A partly-centralized directory infrastructure makes sense for Tor nodes, since every client is supposed to be able to know about every node, but it doesn't make a great deal of sense for hidden services.

To become more censorship-resistant, we started (in Tor 0.2.0.10-alpha, Nov 2007) to instead use the Tor network itself to cache and serve hidden service descriptors. Now, instead of publishing their hidden service descriptors anonymously to a small fixed set of hidden service authorities, hidden services publish to a set of nodes whose identity keys are closest to a hash of the service's identity, the current date, and a replica number.

Improved authorization model for hidden services

We also added improved support for authentication to hidden services. Optionally, to use a hidden service, a client must know a shared key, and use this key to decrypt the part of a hidden service descriptor containing the introduction points. It later must use information in that encrypted part to authenticate to any introduction point it uses, and later to the hidden service itself. One of the main uses of authentication here is to hide presence -- only authenticated users can learn whether the hidden service is online.

3. Faster first-hop circuit establishment with CREATE_FAST

At each stage of extending a circuit to the next hop, the client carries out a Diffie-Hellman (DH) key agreement protocol with that next hop. This step provides confidentiality (and forward secrecy) of the relay cell payload as it is passed on by intermediate hops. Originally Tor also carried out DH with the first hop, even though there was already a DH exchange as part of the TLS handshake. DH is quite computationally expensive for both ends, so Tor 0.1.1.10-alpha (Dec 2005) onwards skipped the DH exchange on the first hop by sending a CREATE_FAST (as opposed to a standard CREATE) cell, which generates key material simply by hashing random numbers sent by the client and server.

4. Cell queueing and scheduling

The original Tor design left the fine-grained handling of incoming cells unspecified: every circuit's cells were to be decrypted and delivered in order, but nodes were free to choose which circuits to handle in any order they pleased.

Early versions of Tor also punted on the question: they handled cells in the order they were received on incoming OR connections, encrypting/decrypting them and handing them off immediately to the next node on the circuit, or to the appropriate exit or entry connection. This approach, however, frequently created huge output buffers where quiet circuits couldn't get a cell in edgewise.

Instead, Tor currently places incoming cells on a per-circuit queue associated with each circuit. Rather than filling all output buffers to capacity, Tor instead fills them up with cells on a near just-in-time basis.

When we first implemented these cell queues in 0.2.0.1-alpha (Jun 2007), we chose which cells to deliver by rotating the circuits in a round-robin approach. In Tor 0.2.2.7-alpha (Jan 2010), we began to instead favor the circuits on each connection that had been quiet recently, so that a circuit with small, infrequent amounts of cells will get better latency than a circuit being used for a bulk transfer. (Specifically, when we are about to put a cell on an outgoing connection, we choose the circuit which has sent the lowest total exponentially-decaying number of cells so far. Currently, each cell has a 30-second half-life.)

In Part 2 we will look at changes to how Tor selects paths and the new anti-censorship measures.

Comments

Please note that the comment area below has been archived.

October 11, 2012

Permalink

TOR has obviously gone through some major improvements in this past decade. Yet here I am, trying to compile TOR on an old computer I have resuscitated from the past. Though weak by today's standards, I figure it should be sufficient considering that I plan to run it at home, where the uplink is severely limited. There are obvious benefits for my own privacy, but would such a non-exit relay be good for the community? I'm asking because it seems that the focus these days seems to be on improving speed and performance of the network rather than multiplying redundant paths.
Is there a forum you can recommend (other than mailing lists) where one can get help with compilation issues on an old linux distro?

I think you are best to try the tor-talk mailing list -- that is where most people are.

I'm not sure if it is a good idea to use an old Linux distro. It almost certainly contains serious security vulnerabilities. Why not upgrade Linux -- even modern Linux distros should run on old hardware?

October 20, 2012

Permalink

A thousand thanks for providing Tor!

I look forward to the next two parts. Although I am not a programmer and probably don't understand very well, this information is potentially helpful to assist me in making better-informed decisions about how to configure/use Tor.

One of the tough decisions involves sociopolitical and technical aspects related to what you discussed in part one above: what is a user to do about Tor nodes he/she doesn't trust? For example, earlier this year blutmagie listed dozens of Syrian nodes as "bad" (presumably because of indications that these were operated by the government there in order to subvert Tor), and not so long ago I noticed I was connected to a node in North Korea (I checked pretty carefully and it really did seem to be registered there, but that node quickly dropped off the network), which would very likely be operated with the approval of the government of North Korea. Naturally a wary user might be tempted to exclude nodes listed as "bad nodes", at least as exit nodes, despite the statistical fingerprinting problem.

More controversially (and no doubt increasing the risk of statistical fingerprinting), a user might be concerned about certain among the larger sub-networks of Tor nodes run by a single entity, such as a supposedly pro-Tor entity which seems to have ties with "Team Themis", a supposedly defunct organization which attacked political activists in the USA such as lawyer/author Glenn Greenwald, particularly if one seems to encounter these same networks repeatedly. I'd like to hear comments from the developers about this issue.

I also want to better understand a topic I guess you will discuss in future parts: the Tor Browser Bundle controller. Also the pros and cons of using alternatives such as the Tor packages provided by Debian, Tor Browser Bundle (TBB), and Tails, all of which as I write seem to operate in significantly different ways.

I'd also like to know more about what MITM attacks are mostly likely to be used by adversaries to try to subvert Tor. A difficult question since almost certainly a good answer is geography-dependent, and also dependent upon what pro-democracy or civil-liberties movements a particular user may be involved in. (E.g. if you live in neither Russia nor Syria, and have posted a blog supporting Pussy Riot but have not expressed support for the Free Syrian Army.) At least in countries which buy surveillance gear from "the West", based upon openly available documents, it seems that some of the more sophisticated vendors do claim the ability to mount effective MITM attacks at will on Tor traffic. I am not sure how much I believe claims made in marketing documents, but I'd like to know more about this issue.

Also, looking ahead to future developments, based on several incidents in the past year, it seems likely that one method adversaries are increasingly likely to use to try to attack Tor (possibly indirectly) by using genuine but improperly issued/obtained certificates to disguise something like a DigiTask trojan as a "security upgrade" in some legitimate software item. This falls outside your brief if the dropper attacks a non-Tor software item, but I'd like to hear any comments you have on mitigating the risk that adversaries might attempt to use some kind of MITM attack to insert malware into a TBB tarball as a user is downloading the latest version. (I always verify the signature but it can be hard to be absolutely certain that I have the valid Tor key.)

Also, I'd be interested in seeing the alleged LSEC flier mentioned in a Tor microblog, but don't want to visit a shortened URL. What about making that flier available at cryptome? Yes, I know John Young has said some harsh things about Tor, and I've seen at least some of the replies from the Tor Project to the effect that he is mis-informed. Might I suggest that a "why should you trust Tor?" page be updated to include some temperate comments explaining what you think he misunderstood?

One last request: I realize this would require the time of some lesser-skilled person at the Tor Project, but what about putting up a bug-reporting forum similar to the forum at Tails, where users can interact anonymously with the developers? Currently I get an error message with one of the official ways of using Tor which I don't understand, and I wish I knew an anonymous method to ask about that message (unless this is it).

Long live Tor!

October 29, 2012

Permalink

I appreciate all the work you are doing ... really.

A couple days ago when restarting TOR after my wife re-booted (she still lives with a Windows mindset), I noticed that I had a user from Iran on my bridge within a few seconds (>10).

This smells like an automated log on from somebody with a fat pipe, not the pattern of an individual user. Comments?