People are starting to ask us about a recent tech report from Sambuddho's group about how an attacker with access to many routers around the Internet could gather the netflow logs from these routers and match up Tor flows. It's great to see more research on traffic correlation attacks, especially on attacks that don't need to see the whole flow on each side. But it's also important to realize that traffic correlation attacks are not a new area.
This blog post aims to give you some background to get you up to speed on the topic.
First, you should read the first few paragraphs of the One cell is enough to break Tor's anonymity analysis:
First, remember the basics of how Tor provides anonymity. Tor clients route their traffic over several (usually three) relays, with the goal that no single relay gets to learn both where the user is (call her Alice) and what site she's reaching (call it Bob).
The Tor design doesn't try to protect against an attacker who can see or measure both traffic going into the Tor network and also traffic coming out of the Tor network. That's because if you can see both flows, some simple statistics let you decide whether they match up.
Because we aim to let people browse the web, we can't afford the extra overhead and hours of additional delay that are used in high-latency mix networks like Mixmaster or Mixminion to slow this attack. That's why Tor's security is all about trying to decrease the chances that an adversary will end up in the right positions to see the traffic flows.
The way we generally explain it is that Tor tries to protect against traffic analysis, where an attacker tries to learn whom to investigate, but Tor can't protect against traffic confirmation (also known as end-to-end correlation), where an attacker tries to confirm a hypothesis by monitoring the right locations in the network and then doing the math.
And the math is really effective. There are simple packet counting attacks (Passive Attack Analysis for Connection-Based Anonymity Systems) and moving window averages (Timing Attacks in Low-Latency Mix-Based Systems), but the more recent stuff is downright scary, like Steven Murdoch's PET 2007 paper about achieving high confidence in a correlation attack despite seeing only 1 in 2000 packets on each side (Sampled Traffic Analysis by Internet-Exchange-Level Adversaries).
Second, there's some further discussion about the efficacy of traffic correlation attacks at scale in the Improving Tor's anonymity by changing guard parameters analysis:
Tariq's paper makes two simplifying assumptions when calling an attack successful [...] 2) He assumes that the end-to-end correlation attack (matching up the incoming flow to the outgoing flow) is instantaneous and perfect. [...] The second one ("how successful is the correlation attack at scale?" or maybe better, "how do the false positives in the correlation attack compare to the false negatives?") remains an open research question.
Researchers generally agree that given a handful of traffic flows, it's easy to match them up. But what about the millions of traffic flows we have now? What levels of false positives (algorithm says "match!" when it's wrong) are acceptable to this attacker? Are there some simple, not too burdensome, tricks we can do to drive up the false positives rates, even if we all agree that those tricks wouldn't work in the "just looking at a handful of flows" case?
More precisely, it's possible that correlation attacks don't scale well because as the number of Tor clients grows, the chance that the exit stream actually came from a different Tor client (not the one you're watching) grows. So the confidence in your match needs to grow along with that or your false positive rate will explode. The people who say that correlation attacks don't scale use phrases like "say your correlation attack is 99.9% accurate" when arguing it. The folks who think it does scale use phrases like "I can easily make my correlation attack arbitrarily accurate." My hope is that the reality is somewhere in between — correlation attacks in the current Tor network can probably be made plenty accurate, but perhaps with some simple design changes we can improve the situation.
The discussion of false positives is key to this new paper too: Sambuddho's paper mentions a false positive rate of 6%. That sounds like it means if you see a traffic flow at one side of the Tor network, and you have a set of 100000 flows on the other side and you're trying to find the match, then 6000 of those flows will look like a match. It's easy to see how at scale, this "base rate fallacy" problem could make the attack effectively useless.
And that high false positive rate is not at all surprising, since he is trying to capture only a summary of the flows at each side and then do the correlation using only those summaries. It would be neat (in a theoretical sense) to learn that it works, but it seems to me that there's a lot of work left here in showing that it would work in practice. It also seems likely that his definition of false positive rate and my use of it above don't line up completely: it would be great if somebody here could work on reconciling them.
For a possibly related case where a series of academic research papers misunderstood the base rate fallacy and came to bad conclusions, see Mike's critique of website fingerprinting attacks plus the follow-up paper from CCS this year confirming that he's right.
I should also emphasize that whether this attack can be performed at all has to do with how much of the Internet the adversary is able to measure or control. This diversity question is a large and important one, with lots of attention already. See more discussion here.
In summary, it's great to see more research on traffic confirmation attacks, but a) traffic confirmation attacks are not a new area so don't freak out without actually reading the papers, and b) this particular one, while kind of neat, doesn't supercede all the previous papers.
(I should put in an addendum here for the people who are wondering if everything they read on the Internet in a given week is surely all tied together: we don't have any reason to think that this attack, or one like it, is related to the recent arrests of a few dozen people around the world. So far, all indications are that those arrests are best explained by bad opsec for a few of them, and then those few pointed to the others when they were questioned.)
[Edit: be sure to read Sambuddho's comment below, too. -RD]
Today Facebook unveiled its hidden service that lets users access their website more safely. Users and journalists have been asking for our response; here are some points to help you understand our thinking.
Part one: yes, visiting Facebook over Tor is not a contradiction
I didn't even realize I should include this section, until I heard from a journalist today who hoped to get a quote from me about why Tor users wouldn't ever use Facebook. Putting aside the (still very important) questions of Facebook's privacy habits, their harmful real-name policies, and whether you should or shouldn't tell them anything about you, the key point here is that anonymity isn't just about hiding from your destination.
There's no reason to let your ISP know when or whether you're visiting Facebook. There's no reason for Facebook's upstream ISP, or some agency that surveils the Internet, to learn when and whether you use Facebook. And if you do choose to tell Facebook something about you, there's still no reason to let them automatically discover what city you're in today while you do it.
Also, we should remember that there are some places in the world that can't reach Facebook. Long ago I talked to a Facebook security person who told me a fun story. When he first learned about Tor, he hated and feared it because it "clearly" intended to undermine their business model of learning everything about all their users. Then suddenly Iran blocked Facebook, a good chunk of the Persian Facebook population switched over to reaching Facebook via Tor, and he became a huge Tor fan because otherwise those users would have been cut off. Other countries like China followed a similar pattern after that. This switch in his mind between "Tor as a privacy tool to let users control their own data" to "Tor as a communications tool to give users freedom to choose what sites they visit" is a great example of the diversity of uses for Tor: whatever it is you think Tor is for, I guarantee there's a person out there who uses it for something you haven't considered.
Part two: we're happy to see broader adoption of hidden services
I think it is great for Tor that Facebook has added a .onion address. There are some compelling use cases for hidden services: see for example the ones described at using Tor hidden services for good, as well as upcoming decentralized chat tools like Ricochet where every user is a hidden service, so there's no central point to tap or lean on to retain data. But we haven't really publicized these examples much, especially compared to the publicity that the "I have a website that the man wants to shut down" examples have gotten in recent years.
Hidden services provide a variety of useful security properties. First — and the one that most people think of — because the design uses Tor circuits, it's hard to discover where the service is located in the world. But second, because the address of the service is the hash of its key, they are self-authenticating: if you type in a given .onion address, your Tor client guarantees that it really is talking to the service that knows the private key that corresponds to the address. A third nice feature is that the rendezvous process provides end-to-end encryption, even when the application-level traffic is unencrypted.
So I am excited that this move by Facebook will help to continue opening people's minds about why they might want to offer a hidden service, and help other people think of further novel uses for hidden services.
Another really nice implication here is that Facebook is committing to taking its Tor users seriously. Hundreds of thousands of people have been successfully using Facebook over Tor for years, but in today's era of services like Wikipedia choosing not to accept contributions from users who care about privacy, it is refreshing and heartening to see a large website decide that it's ok for their users to want more safety.
As an addendum to that optimism, I would be really sad if Facebook added a hidden service, had a few problems with trolls, and decided that they should prevent Tor users from using their old https://www.facebook.com/ address. So we should be vigilant in helping Facebook continue to allow Tor users to reach them through either address.
Part three: their vanity address doesn't mean the world has ended
Their hidden service name is "facebookcorewwwi.onion". For a hash of a public key, that sure doesn't look random. Many people have been wondering how they brute forced the entire name.
The short answer is that for the first half of it ("facebook"), which is only 40 bits, they generated keys over and over until they got some keys whose first 40 bits of the hash matched the string they wanted.
Then they had some keys whose name started with "facebook", and they looked at the second half of each of them to pick out the ones with pronouncable and thus memorable syllables. The "corewwwi" one looked best to them — meaning they could come up with a story about why that's a reasonable name for Facebook to use — so they went with it.
So to be clear, they would not be able to produce exactly this name again if they wanted to. They could produce other hashes that start with "facebook" and end with pronouncable syllables, but that's not brute forcing all of the hidden service name (all 80 bits).
For those who want to explore the math more, read about the "birthday attack". And for those who want to learn more (please help!) about the improvements we'd like to make for hidden services, including stronger keys and stronger names, see hidden services need some love and Tor proposal 224.
Part four: what do we think about an https cert for a .onion address?
Facebook didn't just set up a hidden service. They also got an https certificate for their hidden service, and it's signed by Digicert so your browser will accept it. This choice has produced some feisty discussions in the CA/Browser community, which decides what kinds of names can get official certificates. That discussion is still ongoing, but here are my early thoughts on it.
In favor: we, the Internet security community, have taught people that https is necessary and http is scary. So it makes sense that users want to see the string "https" in front of them.
Against: Tor's .onion handshake basically gives you all of that for free, so by encouraging people to pay Digicert we're reinforcing the CA business model when maybe we should be continuing to demonstrate an alternative.
In favor: Actually https does give you a little bit more, in the case where the service (Facebook's webserver farm) isn't in the same location as the Tor program. Remember that there's no requirement for the webserver and the Tor process to be on the same machine, and in a complicated set-up like Facebook's they probably shouldn't be. One could argue that this last mile is inside their corporate network, so who cares if it's unencrypted, but I think the simple phrase "ssl added and removed here" will kill that argument.
Against: if one site gets a cert, it will further reinforce to users that it's "needed", and then the users will start asking other sites why they don't have one. I worry about starting a trend where you need to pay Digicert money to have a hidden service or your users think it's sketchy — especially since hidden services that value their anonymity could have a hard time getting a certificate.
One alternative would be to teach Tor Browser that https .onion addresses don't deserve a scary pop-up warning. A more thorough approach in that direction is to have a way for a hidden service to generate its own signed https cert using its onion private key, and teach Tor Browser how to verify them — basically a decentralized CA for .onion addresses, since they are self-authenticating anyway. Then you don't have to go through the nonsense of pretending to see if they could read email at the domain, and generally furthering the current CA model.
We could also imagine a pet name model where the user can tell her Tor Browser that this .onion address "is" Facebook. Or the more direct approach would be to ship a bookmark list of "known" hidden services in Tor Browser — like being our own CA, using the old-fashioned /etc/hosts model. That approach would raise the political question though of which sites we should endorse in this way.
So I haven't made up my mind yet about which direction I think this discussion should go. I'm sympathetic to "we've taught the users to check for https, so let's not confuse them", but I also worry about the slippery slope where getting a cert becomes a required step to having a reputable service. Let us know if you have other compelling arguments for or against.
Part five: what remains to be done?
In terms of both design and security, hidden services still need some love. We have plans for improved designs (see Tor proposal 224) but we don't have enough funding and developers to make it happen. We've been talking to some Facebook engineers this week about hidden service reliability and scalability, and we're excited that Facebook is thinking of putting development effort into helping improve hidden services.
And finally, speaking of teaching people about the security features of .onion sites, I wonder if "hidden services" is no longer the best phrase here. Originally we called them "location-hidden services", which was quickly shortened in practice to just "hidden services". But protecting the location of the service is just one of the security features you get. Maybe we should hold a contest to come up with a new name for these protected services? Even something like "onion services" might be better if it forces people to learn what it is.
Looking for a way to help the Internet stay open and free? This topic needs some dedicated people to give it more attention — it could easily grow to as large a project as Tor itself. In the short term, OTF's Information Controls Fellowship Program has expressed interest in funding somebody to get this project going, and EFF's Eva Galperin has said she'd be happy to manage the person as an OTF fellow at EFF, with mentorship from Tor people. The first round of those proposals has a deadline in a few days, but if that timeframe doesn't work for you, this problem isn't going away: let us know and we can work with you to help you coordinate other funding.
We used to think there are two main ways that the Tor network can fail. First, legal or policy pressure can make it so nobody is willing to run a relay. Second, pressure on or from Internet Service Providers can reduce the number of places willing to host exit relays, which in turn squeezes down the anonymity that the network can provide. Both of these threats are hard to solve, but they are challenges that we've known about for a decade, and due in large part to strong ongoing collaborations we have a pretty good handle on them.
We missed a third threat to Tor's success: a growing number of websites treat users from anonymity services differently. Slashdot doesn't let you post comments over Tor, Wikipedia won't let you edit over Tor, and Google sometimes gives you a captcha when you try to search (depending on what other activity they've seen from that exit relay lately). Some sites like Yelp go further and refuse to even serve pages to Tor users.
The result is that the Internet as we know it is siloing. Each website operator works by itself to figure out how to handle anonymous users, and generally neither side is happy with the solution. The problem isn't limited to just Tor users, since these websites face basically the same issue with users from open proxies, users from AOL, users from Africa, etc.
Two recent trends make the problem more urgent. First, sites like Cloudflare, Akamai, and Disqus create bottlenecks where their components are used by many websites. This centralization impacts many websites at once when e.g. Cloudflare changes its strategy for how to handle Tor users. Second, services increasingly outsource their blacklisting, such that e.g. Skype refuses connections from IP addresses that run Tor exit relays, not because they worry about abuse via Tor (it's hard to use Skype over Tor), but because their blacklist provider has an incentive to be overbroad in its blocking. (Blacklist providers compete in part by having "the most complete" list, and in many cases it's hard for services to notice that they're losing contributions from now-missing users.)
Technical mechanisms do exist to let anonymous users interact with websites in ways that control abuse better. Simple technical approaches include "you can read but you can't post" or "you have to log in to post". More complex approaches track reputation of users and give them access to site features based on past behavior of the user rather than on past behavior of their network address. Several research designs suggest using anonymous credentials, where users anonymously receive a cryptographic credential and then prove to the website that they possess a credential that hasn't been blacklisted — without revealing their credential, so the website can't link them to their past behavior.
Social mechanisms have also proven effective in some cases, ranging from community moderation (I hear Wikipedia Germany manually approves edits from users who don't have sufficiently reputable accounts), to flagging behavior from Tor users (even though you don't know *which* Tor user it is) so other participants can choose how to interact with them.
But applying these approaches to real-world websites has not gone well overall. Part of the challenge is that the success stories are not well-publicized, so each website feels like it's dealing with the question in isolation. Some sites do in fact face quite different problems, which require different solutions: Wikipedia doesn't want jerks to change the content of pages, whereas Yelp doesn't want competitors to scrape all its pages. We can also imagine that some companies, like ones that get their revenue from targeted advertising, are fundamentally uninterested in allowing anonymous users at all.
A way forward
The solution I envision is to get a person who is both technical and good at activism to focus on this topic. Step one is to enumerate the set of websites and other Internet services that handle Tor connections differently from normal connections, and look for patterns that help us identify the common (centralized) services that impact many sites. At the same time, we should make a list of solutions — technical and social — that are in use today. There are a few community-led starts on the Tor wiki already, like the DontBlockMe page and a List of Services Blocking Tor.
Step two is to sort the problem websites based on how amenable they would be to our help. Armed with the toolkit of options we found in step one, we should go to the first (most promising) site on the list and work with them to understand their problem. Ideally we can adapt one of the ideas from the toolkit; otherwise we'll need to invent and develop a new approach tailored to their situation and needs. Then we should go to the second site on the list with our (now bigger) toolkit, and so on down the list. Once we have some success stories, we can consider how to scale better, such as holding a conference where we invite the five best success cases plus the next five unsolved sites on our list.
A lot of the work will be building and maintaining social connections with engineers at the services, to help them understand what's possible and to track regressions (for example, every year or so Google gets a new engineer in charge of deciding when to give out Captchas, and they seem to have no institutional memory of how the previous one decided to handle Tor users). It might be that the centralization of Cloudflare et al can be turned around into an advantage, where making sure they have a good practices will scale to help many websites at once.
EFF is the perfect organization to lead this charge, given its community connections, its campaigns like Who has your back?, and its more (at least more than Tor ;) neutral perspective on the topic. And now, when everybody is sympathetic about the topic of surveillance, is a great time to try to take back some ground. We have a wide variety of people who want to help, from scientists and research groups who would help with technical solutions if only they understood the real problems these sites face, to users and activists who can help publicize both the successful cases and the not-yet-successful cases.
Looking ahead to the future, I'm also part of an upcoming research collaboration with Dan Boneh, Andrea Forte, Rachel Greenstadt, Ryan Henry, Benjamin Mako Hill, and Dan Wallach who will look both at the technical side of the problem (building more useful ideas for the toolkit) and also the social side of the problem: how can we quantify the loss to Wikipedia, and to society at large, from turning away anonymous contributors? Wikipedians say "we have to blacklist all these IP addresses because of trolls" and "Wikipedia is rotting because nobody wants to edit it anymore" in the same breath, and we believe these points are related. The group is at the "applying for an NSF grant" stage currently, so it will be a year or more before funding appears, but I mention it because we should get somebody to get the ball rolling now, and hopefully we can expect reinforcements to appear as momentum builds.
In summary, if this call to arms catches your eye, your next steps are to think about what you most want to work on to get started, and how you would go about doing it. You can apply for an OTF fellowship, or we can probably help you find other funding sources as needed too.
This advisory was posted on the tor-announce mailing list.
On July 4 2014 we found a group of relays that we assume were trying to deanonymize users. They appear to have been targeting people who operate or access Tor hidden services. The attack involved modifying Tor protocol headers to do traffic confirmation attacks.
The attacking relays joined the network on January 30 2014, and we removed them from the network on July 4. While we don't know when they started doing the attack, users who operated or accessed hidden services from early February through July 4 should assume they were affected.
Unfortunately, it's still unclear what "affected" includes. We know the attack looked for users who fetched hidden service descriptors, but the attackers likely were not able to see any application-level traffic (e.g. what pages were loaded or even whether users visited the hidden service they looked up). The attack probably also tried to learn who published hidden service descriptors, which would allow the attackers to learn the location of that hidden service. In theory the attack could also be used to link users to their destinations on normal Tor circuits too, but we found no evidence that the attackers operated any exit relays, making this attack less likely. And finally, we don't know how much data the attackers kept, and due to the way the attack was deployed (more details below), their protocol header modifications might have aided other attackers in deanonymizing users too.
Relays should upgrade to a recent Tor release (0.2.4.23 or 0.2.5.6-alpha), to close the particular protocol vulnerability the attackers used — but remember that preventing traffic confirmation in general remains an open research problem. Clients that upgrade (once new Tor Browser releases are ready) will take another step towards limiting the number of entry guards that are in a position to see their traffic, thus reducing the damage from future attacks like this one. Hidden service operators should consider changing the location of their hidden service.
THE TECHNICAL DETAILS:
We believe they used a combination of two classes of attacks: a traffic confirmation attack and a Sybil attack.
A traffic confirmation attack is possible when the attacker controls or observes the relays on both ends of a Tor circuit and then compares traffic timing, volume, or other characteristics to conclude that the two relays are indeed on the same circuit. If the first relay in the circuit (called the "entry guard") knows the IP address of the user, and the last relay in the circuit knows the resource or destination she is accessing, then together they can deanonymize her. You can read more about traffic confirmation attacks, including pointers to many research papers, at this blog post from 2009:
The particular confirmation attack they used was an active attack where the relay on one end injects a signal into the Tor protocol headers, and then the relay on the other end reads the signal. These attacking relays were stable enough to get the HSDir ("suitable for hidden service directory") and Guard ("suitable for being an entry guard") consensus flags. Then they injected the signal whenever they were used as a hidden service directory, and looked for an injected signal whenever they were used as an entry guard.
The way they injected the signal was by sending sequences of "relay" vs "relay early" commands down the circuit, to encode the message they want to send. For background, Tor has two types of cells: link cells, which are intended for the adjacent relay in the circuit, and relay cells, which are passed to the other end of the circuit. In 2008 we added a new kind of relay cell, called a "relay early" cell, which is used to prevent people from building very long paths in the Tor network. (Very long paths can be used to induce congestion and aid in breaking anonymity). But the fix for infinite-length paths introduced a problem with accessing hidden services, and one of the side effects of our fix for bug 1038 was that while we limit the number of outbound (away from the client) "relay early" cells on a circuit, we don't limit the number of inbound (towards the client) relay early cells.
So in summary, when Tor clients contacted an attacking relay in its role as a Hidden Service Directory to publish or retrieve a hidden service descriptor (steps 2 and 3 on the hidden service protocol diagrams), that relay would send the hidden service name (encoded as a pattern of relay and relay-early cells) back down the circuit. Other attacking relays, when they get chosen for the first hop of a circuit, would look for inbound relay-early cells (since nobody else sends them) and would thus learn which clients requested information about a hidden service.
There are three important points about this attack:
A) The attacker encoded the name of the hidden service in the injected signal (as opposed to, say, sending a random number and keeping a local list mapping random number to hidden service name). The encoded signal is encrypted as it is sent over the TLS channel between relays. However, this signal would be easy to read and interpret by anybody who runs a relay and receives the encoded traffic. And we might also worry about a global adversary (e.g. a large intelligence agency) that records Internet traffic at the entry guards and then tries to break Tor's link encryption. The way this attack was performed weakens Tor's anonymity against these other potential attackers too — either while it was happening or after the fact if they have traffic logs. So if the attack was a research project (i.e. not intentionally malicious), it was deployed in an irresponsible way because it puts users at risk indefinitely into the future.
(This concern is in addition to the general issue that it's probably unwise from a legal perspective for researchers to attack real users by modifying their traffic on one end and wiretapping it on the other. Tools like Shadow are great for testing Tor research ideas out in the lab.)
B) This protocol header signal injection attack is actually pretty neat from a research perspective, in that it's a bit different from previous tagging attacks which targeted the application-level payload. Previous tagging attacks modified the payload at the entry guard, and then looked for a modified payload at the exit relay (which can see the decrypted payload). Those attacks don't work in the other direction (from the exit relay back towards the client), because the payload is still encrypted at the entry guard. But because this new approach modifies ("tags") the cell headers rather than the payload, every relay in the path can see the tag.
C) We should remind readers that while this particular variant of the traffic confirmation attack allows high-confidence and efficient correlation, the general class of passive (statistical) traffic confirmation attacks remains unsolved and would likely have worked just fine here. So the good news is traffic confirmation attacks aren't new or surprising, but the bad news is that they still work. See https://blog.torproject.org/blog/one-cell-enough for more discussion.
Then the second class of attack they used, in conjunction with their traffic confirmation attack, was a standard Sybil attack — they signed up around 115 fast non-exit relays, all running on 18.104.22.168/16 or 22.214.171.124/16. Together these relays summed to about 6.4% of the Guard capacity in the network. Then, in part because of our current guard rotation parameters, these relays became entry guards for a significant chunk of users over their five months of operation.
We actually noticed these relays when they joined the network, since the DocTor scanner reported them. We considered the set of new relays at the time, and made a decision that it wasn't that large a fraction of the network. It's clear there's room for improvement in terms of how to let the Tor network grow while also ensuring we maintain social connections with the operators of all large groups of relays. (In general having a widely diverse set of relay locations and relay operators, yet not allowing any bad relays in, seems like a hard problem; on the other hand our detection scripts did notice them in this case, so there's hope for a better solution here.)
In response, we've taken the following short-term steps:
1) Removed the attacking relays from the network.
2) Put out a software update for relays to prevent "relay early" cells from being used this way.
3) Put out a software update that will (once enough clients have upgraded) let us tell clients to move to using one entry guard rather than three, to reduce exposure to relays over time.
4) Clients can tell whether they've received a relay or relay-cell. For expert users, the new Tor version warns you in your logs if a relay on your path injects any relay-early cells: look for the phrase "Received an inbound RELAY_EARLY cell".
The following longer-term research areas remain:
5) Further growing the Tor network and diversity of relay operators, which will reduce the impact from an adversary of a given size.
6) Exploring better mechanisms, e.g. social connections, to limit the impact from a malicious set of relays. We've also formed a group to pay more attention to suspicious relays in the network:
7) Further reducing exposure to guards over time, perhaps by extending the guard rotation lifetime:
8) Better understanding statistical traffic correlation attacks and whether padding or other approaches can mitigate them.
9) Improving the hidden service design, including making it harder for relays serving as hidden service directory points to learn what hidden service address they're handling:
Q1) Was this the Black Hat 2014 talk that got canceled recently?
Q2) Did we find all the malicious relays?
Q3) Did the malicious relays inject the signal at any points besides the HSDir position?
Q4) What data did the attackers keep, and are they going to destroy it? How have they protected the data (if any) while storing it?
Great questions. We spent several months trying to extract information from the researchers who were going to give the Black Hat talk, and eventually we did get some hints from them about how "relay early" cells could be used for traffic confirmation attacks, which is how we started looking for the attacks in the wild. They haven't answered our emails lately, so we don't know for sure, but it seems likely that the answer to Q1 is "yes". In fact, we hope they *were* the ones doing the attacks, since otherwise it means somebody else was. We don't yet know the answers to Q2, Q3, or Q4.
There has been a considerable amount of work in the area of Tor incentives since the last post on the topic in 2009 (over 5 years ago!). This post will give an overview of some of the major new approaches, including two new designs that I co-authored that will appear at the Workshop on Hot Topics in Privacy Enhancing Technologies later this week. Before getting to those, I'll give background on Tor and discuss some of the social issues to help form a foundation of understanding.
The Tor network consists of a set of relays that provide bandwidth and other resources to forward traffic for Tor users. Anyone in any part of the world can contribute to Tor by downloading the Tor software and configuring it to operate in relay mode. In fact, the Tor network is currently composed exclusively of such voluntarily-operated relays.
There are many reasons existing relay operators might contribute to Tor. They may contribute because they really want a privacy-enhancing communication tool like Tor to exist, they believe in Tor’s philosophy of open source software and operational transparency, and they want Tor to succeed at its goals. They may wish to be part of the Tor community and the social movement to provide privacy-enhanced communication that connects people in all parts of the world. Tor may be practically useful to them for communication with others or to retrieve content that may otherwise be unavailable to them due to censorship or other network interference. Or they may be technically interested in the Tor software or network and the associated research problems raised by large, distributed systems. Finally, some may be interested for adversarial reasons, e.g., collecting information on the uses of the network or the content being transferred. All of these reasons provide intrinsic motivation for operators to contribute - they don’t expect any direct compensation for their contributions.
The volunteer approach has succeeded so far: Tor now consists of over 5000 relays transferring between 4 and 5 GiB/s in aggregate. For the most part, this is "organic" growth obtained through community outreach, where volunteers first have personal contact with existing community members and become inspired to help.
Whatever their reason for volunteering, relay operators not only contribute resources from the physical machines upon which their relays are executed, but also contribute the time involved with configuring, updating, and adjusting their relays as both the software and network mature. In many cases, operators also contribute monetarily through the direct purchase of dedicated machines or special fast connections to Internet Service Providers (ISPs), and exit relay operators spend social energy on maintaining relationships with their ISP (to make sure the ISP understands that requests coming from their machines are from Tor).
However, because many people that are otherwise excited about or interested in Tor are incapable or unwilling to incur these expenses, there are far fewer Tor relays than Tor clients (users that use Tor to access content, e.g. via the Tor Browser Bundle, without directly contributing resources to the network). Expanding the set of relays could have many benefits: Tor could become faster, more reliable, and more secure while at the same time distributing trust among a larger and more diverse set of distributed peers. A larger network that is transferring a larger quantity and more diverse types of data will, in general, make it more difficult for an adversary to determine who is talking to whom. This means a safer Tor for everyone.
There are many social and technical challenges involved with expanding the set of Tor relays. We focus here on the social issues surrounding how to encourage more people to run a relay in the first place by providing incentives (i.e., rewards) for relay operators, and later focus on how to do this without leaking information or otherwise hurting Tor’s anonymity.
An important social issue to consider is that of maintaining the existing community of operators and how rewards could harm it. The presumption here is that existing relay operators are motivated to run relays for their own reasons - the act of contributing to Tor itself has intrinsic value to them. Then the question is: what would happen if we start rewarding operators through the use of an incentive system?
The answer isn’t clear, and there are at least a couple of forks in the cognitive road with many more questions along the way. First, what would happen to the existing operators who used to feel good about volunteering but are now compensated with a reward that may have extrinsic value? Although the new reward may provide some extrinsic value, will it lower their original intrinsic value enough to cause them to lose their motivation to contribute? Will they continue their contributions but begin to expect the reward, such that if the reward is removed later they would no longer be motivated even though originally (before the reward) they were happy to contribute? Second, is the new group of operators attracted to Tor by the reward somehow less desirable than the existing group, because they don’t care as much about Tor or are more likely to stop contributions and leave the system more quickly than the existing volunteers? Is the fact that they have higher extrinsic motivation than intrinsic make them better or worse operators? Will they be less likely to incur the costs of saying "no" when people ask them to log or turn over traffic? Will they in fact be willing to sell logs if it increases their rewards? Third, how will the new group of operators affect the existing group? If their intrinsic motivation was low enough before the reward that they didn’t contribute, but high enough with the reward that they do contribute, are these new contributors going to somehow shift the "community spirit" into something that no longer has the value that it once had? Will this "crowd out" the intrinsically motivated individuals and cause existing operators to leave (a widely acknowledged theory)?
The answers to these questions are not clear, though speculations abound. One response is that we will never know what will happen until we try it, but if we try it we may not like the result. Another is that if we try it and it succeeds, great; if it starts going badly, we can adapt and adjust.
Researchers and other Tor community members have been interested in the Tor incentive problem because a well-designed incentive scheme has the potential to greatly improve the network (we will discuss several technical designs below). However, because of the many open social questions, it has been challenging for the Tor community to come to a consensus on the path forward. On one hand and as discussed above, many people already run Tor relays and provide valuable service, and the network and bandwidth graphs imply some amount of success in network growth. It would be tragic if the introduction of an experimental reward scheme drove away the existing participants. On the other hand, the number of Tor clients still vastly exceeds the number of relays, and Tor’s bandwidth capacity remains the limiting factor for Tor’s growth.
If we consider that relay operators with a higher intrinsic motivation to contribute are somehow more desirable to the network and the community because they care more deeply about Tor and the social movement, then we can consider there to be a trade-off between reward value and the desirability of the operator. The lower the extrinsic reward value is, the higher the intrinsic value a potential new operator must possess in order for the total value to be high enough to motivate her to contribute. The higher the extrinsic reward value is, the lower the intrinsic value may be to still attract the new operator. Under this model, as the value of the reward increases, the number of individuals willing to contribute also increases while their "quality" decreases.
Please note that this is a very simplified model of reasoning and since it does not necessarily reflect reality, not everyone should agree with it. There are many other values in play here as well; for example, motivations change over time as the world changes, and incentive schemes that require significant protocol modifications are more difficult to implement, test, and deploy. Nonetheless, this simplified model may help sort out the issues, and it seems to be consistent with the fact that the Tor community has thus far preferred to grow the network through intrinsically motivated relay operators.
Tor’s volunteer approach has generally been on the conservative side of our simplified model, where individuals with higher intrinsic motivations to contribute to Tor are preferred. New operators have been recruited through social and community-oriented efforts such as explaining how great Tor is and why they should care. This works well for people who are already intrinsically motivated about Tor, such as this guy who tattooed a Tor onion on his arm.
Other relay recruitment approaches for intrinsically motivated individuals include Torservers.net and the Noisebridge Tor Exit Node Project: both operate as independent nonprofit organizations for those people who would like to contribute to the Tor network but for one reason or another are not in a position to operate relays themselves. As discussed in a 2011 post, it is preferable that people who can run their own fast exit relays do so. But the approach of turning donations into fast exit capacity allows those who cannot run their own relays to still contribute to improving the performance and exit capacity of the Tor network.
The more recent EFF Tor Challenge is somewhere on the same side of the spectrum, except it actually offers small rewards as incentives. EFF explains Tor’s importance and users are encouraged to contribute to the network by running a relay. The challenge offers prizes to relay operators as an incentive: name and twitter handle published in a list of contributors, a limited edition sticker for running a relay for 12 months, and a t-shirt if that relay is fast (bandwidth of 1 MB/s or larger after 12 months).
Other social rewards are possible here as well. A post on the tor-dev mailing list proposes that Tor host a simple profile page for each relay that will list and celebrate the relay’s total bandwidth contribution over time, and recognize the operator for running special Tor bridge or exit relays. Relatively simple ideas like this may have the potential to attract intrinsically-motivated individuals without requiring extrinsic rewards or a lot of changes to Tor itself.
The approaches in this category are relatively low-risk/low-reward schemes: even though the new operators are receiving recognition or a reward for running a relay, the reward is of such little value that there is little risk that existing contributors who may not have been rewarded will leave. At the same time, a new operator loses little if it decides to shut down the new relay.
On the other side of the spectrum are more "radical" approaches that do not require individuals with high intrinsic motivation (though they are welcome, too!) but change Tor’s design have been explored by researchers. Researchers explore these designs because they are generally more technically interesting and have the potential to produce a much larger set of relays. We now shift to discussing these latest research proposals.
In the Gold Star scheme, the Tor directory authorities measure the bandwidth of Tor relays and assign a special flag (a "gold star") to the fastest 7/8 of relays. Then whenever those relays send traffic through Tor, they choose the other fast gold star relays to form a fast gold star path. Relays then prioritize traffic on these gold star paths. The incentive here is that if you run a relay, you will get faster service when using Tor as a client. The main drawback to this approach is that only relays are able to get gold stars and priority service. This means that all relays that are part of a gold star path know for certain that the initiator of that traffic is someone from the small list of gold star relays. Because the set of gold star relays would be smaller than the set of all Tor users by several orders of magnitude, the anonymity for gold star relays would be significantly harmed.
In PAR, all users (not just relays) are able to be part of the incentive system. PAR has an honest-but-curious centralized entity called a "bank" that manages digital tokens called "coins". Users can purchase coins from the bank and then pay the relays while using Tor; the relays then deposit the coins back into the bank and the bank checks to make sure the coin is valid and that it has never been spent before (i.e., it has not been "double spent"). The main novel idea explored in this work is how to include digital payments into each Tor circuit in a way that prevents the relays from learning the client’s identity from the payment itself.
Adding real money into Tor opens a host of new legal and technical questions, which Roger already briefly discussed. For example, real money might shift Tor into a different legal category (see e.g. the EU discussions of what is a "service provider" and which ones are obliged to do data retention), or change the liability situation for relay operators.
The main design challenge we learned from the PAR design is that the timing of when the client withdraws the coins and the relays deposit them creates a trade-off between the ability to detect double spending and link a client to its coins. If the client withdraws some coins and a few seconds later a relay starts depositing coins, how much information does the bank gain? If the relay waits for some time interval to deposit the coins to hide this timing information, then it becomes possible for the client to double spend those coins during that interval. The longer the relay waits before depositing, the harder it is for the bank to link the coins but the easier it is for the client to cheat. A rational relay will deposit immediately to secure its payment, which leads to the worst situation for anonymity. If we assume the relays will deposit immediately, then it is up to the client to hold coins for some random amount of time after purchasing them from the bank, to disrupt potential attempts to link withdrawals to deposits at the cost of flexibility in usage.
In review, both of these proposals have anonymity problems: the Gold Star scheme because the assignment of rewards to relays is public and identifies traffic as originating from the relatively small set of clients of fast relay operators; and PAR because the timing of the withdrawal by the client and the deposit by the relay may leak information about the client’s traﬃc. In PAR, coins may be held by the client and the relay longer to disrupt this timing information, but this trades off flexibility in usage and the speed at which double spending can be detected. So what have these papers taught us? We have learned about some requirements that a Tor incentive scheme should probably fulfill: both clients and relays should be able to receive the incentive so that neither group stands out; and the timing of payments should not allow cheating and should not enable linkability.
Recruiting New Tor Relays with BRAIDS was the followup research proposal presented at CCS in 2010. One of our goals in BRAIDS was to eliminate the trade-off between double-spending and linkability. To achieve this, we designed a new type of digital token which we called a "relay-specific ticket". Tickets were still issued by a trusted, centralized entity - the ticketmaster (we called it a "bank" in the paper, but BRAIDS was not designed to handle real money). Clients choose which relays they want to use, and receive tickets from the ticketmaster that are valid only at the chosen relays while hiding the chosen relay information from the ticketmaster (using partially-blind signatures). Clients then form a Tor circuit with the chosen relays and send the tickets to receive traffic priority (using a differentiated services scheduler). Each ticket will then provide traffic priority through the chosen relay for a specific number of bytes. Because each ticket a relay receives is only valid at that relay and no other, the relay could prevent double spending locally without contacting the ticketmaster.
Another feature of BRAIDS is that tickets are distributed freely in small amount to any client that asks, but only to one client per IP address, and only once per distribution round. If the clients change their mind about which relays they want to use for a circuit, they may contact the ticketmaster to exchange their old tickets for new ones following a time schedule. The exchange process is also used by relays to turn the tickets they received from clients into new usable tickets.
One main drawback to BRAIDS is that even though all users are able to get a small amount of tickets for free, relays are able to accumulate a much larger stash because they receive the free tickets AND the tickets sent to them by other clients. This means that relays stand out when using tickets to download large flows because it is less likely that a normal client would have been able to afford it. Another major drawback is that the exchange process is somewhat inefficient. Relays will exchange received tickets for new ones which they can use themselves, and clients that didn't spend their tickets (e.g., because their Tor usage was low or their chosen relays became unavailable) must exchange them or lose them. This leads to the ticketmaster exchanging all system tickets over every exchange interval. (The ticket validity interval is split into [spend, relay exchange, client exchange], so clients that don't spend in the "spend" time-frame must wait until "client exchange" time-frame to get new tickets. Increasing the interval lengths make them slightly less flexible to rapid changes. The longer the intervals, the more tickets will "pile up" for later processing by the ticketmaster.)
BRAIDS showed us the power of relay-specific tickets but unveiled the scalability problems associated with a trusted, centralized entity.
LIRA: Lightweight Incentivized Routing for Anonymity was published at NDSS 2013. LIRA still uses a centralized entity to manage the incentives, but LIRA only requires incentive management for the relays (thousands) instead of for all system users (millions) like BRAIDS. The way we achieve this is through the use of a lottery system, where clients could simply guess a random number for every circuit to receive priority on that circuit (traffic is prioritized using a differentiated services scheduler as in BRAIDS) with tunable probability. The lottery is set up with special cryptography magic such that relays are rewarded with guaranteed winning guesses to the lottery; relays are allotted winners according to the amount of bandwidth they contributed.
LIRA is more efficient and scalable than any earlier scheme. However, LIRA’s main drawback is that probabilistic guessing reduces flexibility for clients wanting to receive continuous priority over time, and creates a potential for cheating the system because it enables clients to continuously create new circuits and new guesses until a correct guess is found. Another problem is that a secure bandwidth measurement scheme is required to ensure that relays don’t receive rewards without actually contributing to Tor (this wasn't necessary in BRAIDS because the clients sent rewards (tickets) directly to the relays); secure bandwidth measurement is still an open research problem. Finally, LIRA still relies on a trusted central entity to manage the lottery.
From Onions to Shallots: Rewarding Tor Relays with TEARS will be presented at HotPETs later this week. The main goal in TEARS is to remove the reliance on a central entity to manage the incentives. The central entities in the above schemes (let’s generalize them as "banks") are referred to as "semi-trusted", because there are several ways a malicious bank could misbehave - for example, the bank could refuse service, could extort users by demanding extra fees in order to process their requests, or could "inflate" the digital currency (coins, tickets, guesses, etc.) by printing its own tokens for a profit.
TEARS draws upon the decentralized Bitcoin design to address these concerns by using a *publicly auditable* e-cash protocol that prevents the bank from misbehaving and getting away with it. The bank in TEARS consists of a group of semi-trusted servers, such as the Tor directory servers (as opposed to Bitcoin’s distributed proof-of-work lottery), only a quorum of which need to function correctly for the overall system to keep working. The e-cash cryptography used here is publicly auditable and every interaction with the bank is conducted over a public communication channel (such as the Bitcoin blockchain itself). The security guarantee is that any form of misbehavior on the part of the bank servers leaves a trail of evidence that can be discovered by anyone watching. This approach is reminiscent of systems like Certificate Transparency and OpenTransactions.
In addition to a decentralized bank, TEARS offers a new two-level token architecture to facilitate rewarding relays for their bandwidth contributions without hurting anonymity. First, a decentralized process audits relays’ bandwidth and informs the bank of the results. The bank mints new "shallots" (anonymous, auditable e-cash) for each relay based on their contributions and deposits them into the relay accounts on their behalf. Separately, the bank’s monetary policy may allow it to mint new shallots and distribute them to users, e.g. using mechanisms suggested in BRAIDS but more commonly known to Bitcoin users as faucets. Second, shallots are transferable among users and may be redeemed for relay-specific "PriorityPasses", which are then used to request traffic priority from Tor relays (again, as in BRAIDS). PriorityPasses are relay-specific so that relays can immediately and locally prevent double spending without leaking information to any other entity. This is a similar feature present in BRAIDS’ tickets. However, a novel feature of PriorityPasses is that they are non-transferable and become useless after being spent at a relay -- this reduces overhead associated with exchanges, and ensures that the process of requesting traffic priority does not harm anonymity because the act of redeeming a shallot for a PriorityPass will be unlinkable to any later transaction. There is a question of how many PriorityPasses can be spent in one circuit before it is suspicious that a client has so many, so the size of the faucets and how they distribute Shallots will play a key role in anonymity. Anonymity is also tied to how relays decide to distribute their Shallots to clients, either via a faucet or a through a third party market.
TEARS was designed to operate inside the existing Tor network and thus does not significantly change the fundamentals of Tor’s design. The decentralized bank and bandwidth measurement components do not alter the way clients choose circuits. Clients and relays that want to use or support TEARS, however, would need to support new protocols for interacting with the bank and logic for handling shallots, PriorityPasses, and traffic priority.
TEARS still relies on a "bandwidth measuring" component that can accurately and robustly determine when a relay has indeed contributed useful bandwidth. While the e-cash system in TEARS is designed to be publicly auditable, the existing mechanisms for bandwidth still require trusted authorities to probe.
A TorPath to TorCoin - Proof-of-Bandwidth Altcoins for Compensating Relays is the other paper to be presented at HotPETs this week. TorCoin addresses the bandwidth measurement problem with a different approach -- an altcoin (a Bitcoin alternative) based on a novel "proof-of-bandwidth" (rather than proof-of-work) mechanism called TorPath, in which the relays (and endpoints) of a circuit effectively mine for new coins whenever they successfully transfer a batch of packets. In TorCoin, a group of "assignment" authorities are responsible for generating a list of circuits (using a shuffle protocol) and assigning them to clients. Bandwidth proofs are then constructed as the circuit is used such that the client mines the TorCoin and then transfers part of it to each of the circuit’s participants.
Like TEARS, TorCoin distributes the process of rewarding relays for their bandwidth contributions. Bandwidth measurement is done directly as part of the distributed mining process and provides strong guarantees. Also, by utilizing a group of assignment authorities that may have more information about the system or underlying network, there is a lot of potential for generating more secure paths for clients than clients are able to generate for themselves.
TorCoin still has some issues to work out; it may be possible to fix some of the smaller issues with protocol modifications, but some of the larger issues don’t have obvious solutions.
One drawback to TorCoin is that it requires the group of collectively-trusted assignment authorities (you have to trust only that some threshold/quorum number of them are correct) to generate and assign circuits to clients. This is a similar trust model to the current Tor directory authorities. In practice, the assignment authorities cause availability issues: if a majority of the assignment authorities are unreachable, e.g. due to DoS or censorship, then the system is unusable to the clients because they won’t be able to generate circuits. This is also somewhat true of the directory authorities, however, directory information can be signed and then mirrored and distributed by other relays in the system whereas assignment authorities are required to always be online and available to new clients. TorCoin clients contact the assignment authorities in order to build new circuits, whereas in Tor they can build as many circuits as they need once the directory information is retrieved (from the directory auths or from directory mirrors).
TorCoin as written has significant security issues. Because relay assignment is not based on bandwidth, it is easier for an adversary to get into the first and last position on a circuit and break anonymity. This can be done through a sybil attack by adding an arbitrary number of bad relays and joining them to the network without actually providing bandwidth. Because the protocol reveals which ephemeral keys are attached to the assigned circuits, an adversary can confirm when it has compromised a circuit (has malicious nodes in the correct positions) without needing to do any statistical correlation attack (it can match up the ephemeral keys assigned to its malicious relays to the ones posted in the circuit assignment list).
The formation of relay/client groups is not discussed and is similarly vulnerable to sybil attacks where the adversary can completely subvert the coin mining process. This can be done by registering an arbitrary number of relays and clients with the assignment servers, such that a large majority of circuits created by the assignment process will contain malicious relays in all positions and be assigned to a malicious client. This malicious collective of nodes can then “pretend” to send bytes through the circuit without actually doing so, and gain an advantage when mining coins. The paper suggests to use a persistent guard, which means the adversary only needs malicious relays in the middle and exit positions of its sybil client circuits, exponentially increasing the probability of a full compromise (or requiring far fewer nodes to achieve the same probability of compromise as without persistent guards). (The sybil clients only have to get 2 of its relays in the circuit instead of 3, reducing the probability from f^3 to f^2 for malicious fraction f.) Further, even if some relays on a circuit are honest, it is not rational for them to refuse to sign proofs of bandwidth that have been exaggerated (too high) by other relays. It will only benefit a relay to ignore proof-of-bandwidth checks, giving it an advantage over completely honest nodes in the TorCoin mining process.
There are a variety of unaddressed practical deployment issues as well. It is not clear how to do load balancing with TorCoin alone - no one should be able to determine how many bytes were sent by any of the circuit members or where the payments for mined coins are being sent (anonymous TorCoin transactions are necessary for anonymity). Exit policies are not discussed, and there is no clear way to support them and for a client that would like to request a specific exit port. Its not clear how to ensure that a relay is not chosen for the same circuit twice, or two relays from the same relay family are not on the same circuit. Finally, it is not clear how to link ephemeral circuit keys to TorCoin addresses so that payments may be sent from clients to relays without revealing client identity.
Don't misunderstand my ranting here - I think that the TorCoin idea is great (I am a co-author after all). It has the potential to motivate new researchers and developers to start thinking about solutions to problems that Tor is interested in, particularly bandwidth measurement. However, the limitations need to be clear so that we don't start seeing production systems using it and claiming to provide security without working through *at least* the issues mentioned above. The current paper was an initial concept in its early stages, and I expect the system to improve significantly as it is furthered developed.
(For completeness, we should also point out that TorCoin will need a new name if anybody decides to build it. The Tor trademark faq says it's fine for research paper designs to use the Tor mark, but if it becomes actual software then it sure will be confusing to users and the rest of the Tor community about whether it's written by or endorsed by Tor.)
Note that TorCoin and TEARS are at least somewhat complementary, since TEARS *needs* a proof-of-bandwidth scheme, and TorCoin *provides* one. However, they’re also not directly compatible. TorCoin requires a substantial change to both Bitcoin and to Tor (or to put it another way, it would be a system that only partially resembles Bitcoin and only partially resembles Tor). On the other hand, TEARS leaves Tor's circuit-finding mechanism intact, and the token protocol in TEARS is closer to traditional anonymous e-cash systems than to Bitcoin.
As outlined above, recent research has made great improvements in the area of Tor incentives. More work is needed to show the feasibility and efficiency of the decentralized approaches, and a secure bandwidth measurement scheme would help fill a critical piece missing from TEARS. Recent improvements to the way Tor schedules sockets and circuits would be necessary to correctly provide traffic priority (see #12541), and then a differentiated services scheduler that is able to prioritize traffic by classes (already described in and prototyped for the BRAIDS, LIRA, and TEARS papers) would also be needed.
Unfortunately, it is unclear how to make headway on the social issues. A small scale rollout of an "experimental build" to those relays who want to support new incentive features could be one way to test a new approach without committing to anything long term.
One question that is often raised is: if an incentive scheme rewards relays for providing bandwidth, then won’t everyone just pick the same cheapest hosting provider and Tor will lose location diversity? This question is largely addressed in the discussion of what constitutes a useful service in the TEARS paper (the TEARS paper and appendices also gives useful commentary on many of the common problems and design decisions to make when designing an incentive scheme). Basically, it can be addressed in the monetary policy, e.g., in addition to rewarding relays for bandwidth, the bank could also assign weights that could be used to adjust the rewards based on certain flags that relays possess, or the geographic location in which they operate. This could be adjusted over time so that there would be a higher incentive to run relays in parts of the world where none exist, or to prefer exit relays over other positions, etc. Although, note that it is unclear exactly what the "correct" utility function should be and when/how it should be adjusted. Note that Torservers.net similarly rewards relays for location diversity (see here).
Another point to make here is that most of these approaches have nothing to do with giving out or transferring real dollars. The tokens in most of these schemes are useful only to receive traffic priority in Tor. Will there be third party markets that form around the exchange of the tokens? Sure. And they may be speculated. But at the end of the day, the tokens would still only provide prioritized traffic. Depending on the configuration of the priority scheduler, the difference between priority traffic and normal traffic may not be that extreme. It is conceivable that the tokens would not be worth nearly enough to compensate an operator for the ISP connection, much less the overhead involved with updating the software, maintaining the machine, and talking with the ISP -- and in that case we are still on the more conservative side of the social incentive discussions above.
Tor has some choices to make in terms of how to grow the network and how to position the community during that growth process. I hope that this post, and the research presented herein, will at least help the community understand some of the options that are available.
All the best, ~Rob
[Thanks to Roger Dingledine, Bryan Ford, Aaron Johnson, Andrew Miller, and Paul Syverson for input and feedback on this post.]
The 4th USENIX Workshop on Free and Open Communications on the Internet is calling for papers. The workshop will be held on August 18, 2014 and seeks to bring together researchers and practitioners working on means to study, detect, or circumvent Internet censorship.
Past iterations of the workshop featured a number of great Tor research papers including a proposal for cloud-based onion routing, our OONI design paper, an analysis of how the GFW blocks Tor, a censorship analyzer for Tor, and a proposal for latency reduction in Tor circuits.
Given the past success of the FOCI series, we are looking forward to another round of great Tor papers.
Paper submissions are due on Tuesday, May 13, 2014, 11:59 p.m. PDT.
Recently somebody asked me why our usage numbers in China are so low. More precisely, his question was "How do I read this graph in any way other than 'Tor is effectively blocked in China'?" After writing up an answer for him, I realized I should share it with the rest of the Tor community too.
The correct interpretation of the graph is "obfs3 bridges have not been deployed enough to keep up with the demand in China". So it isn't that Tor is blocked — it's that we haven't done much of a deployment for obfs3 bridges or ScrambleSuit bridges, which are the latest steps in the arms race.
The short explanation is that the old vanilla SSL Tor transport doesn't work in China anymore due to their active probing infrastructure. The obfs2 transport doesn't work anymore either, for the same reason. The obfs3 transport works great for now, and thousands of people are happily using it — and some of those people aren't reflected in the graphs you see (I'll explain that more below).
The medium-length explanation is that we've been leading and coordinating the international research effort at understanding how to design and analyze transports that resist both DPI and active probing, and approximately none of these approaches have been deployed at a large scale yet. So it doesn't make sense to say that Tor is blocked in China, because it mischaracterizes Tor as a static protocol. "Tor" when it comes to censorship circumvention is a toolbox of options — some of them work in China, some don't. The ones that work (i.e. that should resist both DPI and active probing) haven't been rolled out very widely, in large part because we have funders who care about the research side but we have nobody who funds the operations, deployment, or scale-up side.
The long explanation is that it comes down to three issues:
First, there are some technical steps we haven't finished deploying in terms of collecting statistics about users of bridges + pluggable transports. The reason is that the server side of the pluggable transport needs to inform the Tor bridge what country the user was from, so the Tor bridge can include that in its (aggregated, anonymized) stats that it publishes to the metrics portal. We've now built most of the pieces, but most of the deployed bridges aren't running the new code yet. So the older bridges that are reporting their user statistics aren't seeing very many users from China, while the bridges that *aren't* reporting their user statistics, which are the ones that offer the newer pluggable transports, aren't well-represented in the graph. We have some nice volunteers looking into what fraction of deployed obfs3 bridges don't have this new 'extended ORPort' feature. But (and you might notice the trend here) we don't have any funders currently who care about counting bridge users in China.
Second, we need to get more addresses. One approach is to get them from volunteers who sign up their computer as a bridge. That provides great sustainability in terms of community involvement (we did a similar push for obfs2 bridges back when Iran was messing with SSL, and got enough to make a real difference at the time), but one address per volunteer doesn't scale very well. The intuition is that the primary resource that relays volunteer is bandwidth, whereas the primary resource that bridges volunteer is their address — and while bandwidth is an ongoing contribution, once your IP address gets blocked then your contribution has ended, at least for the country that blocked it, or until you get another address via DHCP, etc. The more scalable approaches to getting bridge addresses involve coordinating with ISPs and network operators, and/or designs like Flashproxy to make it really easy for users to sign up their address. I describe these ideas more in "approach four" and "approach five" of the Strategies for getting more bridge addresses blog post. But broad deployment of those approaches is again an operational thing, and we don't have any funded projects currently for doing it.
Third, we need ways of letting people learn about bridges and use them without getting them noticed. We used to think the arms race here was "how do you give out addresses such that the good guys can learn a few while the bad guys can't learn all of them", a la the bridges.torproject.org question. But it's increasingly clear that scanning resistance will be the name of the game in China: your transport has to not only blend in with many other flows (to drive up the number of scans they have to launch), but also when they connect to that endpoint and speak your protocol, your service needs to look unobjectionable there as well. Some combination of ScrambleSuit and FTE are great starts here, but it sure is a good thing that the research world has been working on so many prototype designs lately.
So where does that leave us? It would be neat to think about a broad deployment and operations plan here. I would want to do it in conjunction with some other groups, like Team Cymru on the technical platform side and some on-the-ground partner groups for distributing bridge addresses more effectively among social networks. We've made some good progress on the underlying technologies that would increase the success chances of such a deployment — though we've mostly been doing it using volunteers in our spare time on the side, so it's slower going than it could be. And several other groups (e.g. torservers.net) have recently gotten funding for deploying Tor bridges, so maybe we could combine well with them.
In any case it won't be a quick and simple job, since all these pieces have to come together. It's increasingly clear that just getting addresses should be the easy part of this. It's how you give them out, and what you run on the server side to defeat China's scanning, that still look like the two tough challenges for somebody trying to scale up their circumvention tool design.
New work on denial of service in Tor will be presented at NDSS '14 on Tuesday, February 25th, 2014:
The Sniper Attack: Anonymously Deanonymizing and Disabling the Tor Network
by Rob Jansen, Florian Tschorsch, Aaron Johnson, and Björn Scheuermann
To appear at the 21st Symposium on Network and Distributed System Security
We found a new vulnerability in the design of Tor's flow control algorithm that can be exploited to remotely crash Tor relays. The attack is an extremely low resource attack in which an adversary's bandwidth may be traded for a target relay's memory (RAM) at an amplification rate of one to two orders of magnitude. Ironically, the adversary can use Tor to protect it's identity while attacking Tor without significantly reducing the effectiveness of the attack.
We studied relay availability under the attack using Shadow, a discrete-event network simulator that runs the real Tor software in a safe, private testing environment, and found that we could disable each of the fastest guard and the fastest exit relay in a range of 1-18 minutes (depending on relay RAM capacity). We also found that the entire group of the top 20 exit relays, representing roughly 35% of Tor bandwidth capacity at the time of the analysis, could be disabled in a range of 29 minutes to 3 hours and 50 minutes. We also analyzed how the attack could potentially be used to deanonymize hidden services, and found that it would take between 4 and 278 hours before the attack would succeed (again depending on relay RAM capacity, as well as the bandwidth resources used to launch the attack).
Due to our devastating findings, we also designed three defenses that mitigate our attacks, one of which provably renders the attack ineffective. Defenses have been implemented and deployed into the Tor software to ensure that the Tor network is no longer vulnerable as of Tor version 0.2.4.18-rc and later. Some of that work can be found in Trac tickets #9063, #9072, #9093, and #10169.
In the remainder of this post I will detail the attacks and defenses we analyzed, noting again that this information is presented more completely (and more elegantly) in our paper.
The Tor Network Infrastructure
The Tor network is a distributed system made up of thousands of computers running the Tor software that contribute their bandwidth, memory, and computational resources for the greater good. These machines are called Tor relays, because their main task is to forward or relay network traffic to another entity after performing some cryptographic operations. When a Tor user wants to download some data using Tor, the user's Tor client software will choose three relays from those available (an entry, middle, and exit), form a path or circuit between these relays, and then instruct the third relay (the exit) to fetch the data and send it back through the circuit. The data will get transferred from its source to the exit, from the exit to the middle, and from the middle to the entry before finally making its way to the client.
The client may request the exit to fetch large amounts of data, and so Tor uses a window-based flow control scheme in order to limit the amount of data each relay needs to buffer in memory at once. When a circuit is created, the exit will initialize its circuit package counter to 1000 cells, indicating that it is willing to send 1000 cells into the circuit. The exit decrements the package counter by one for every data cell it sends into the circuit (to the middle relay), and stops sending data when the package counter reaches 0. The client at the other end of the circuit keeps a delivery counter, and initializes it to 0 upon circuit creation. The client increments the delivery counter by 1 for every data cell it receives on that circuit. When the client's delivery counter reaches 100, it sends a special Tor control cell, called a SENDME cell, to the exit to signal that it received 100 cells. Upon receiving the SENDME, the exit adds 100 to its package counter and continues sending data into the circuit.
This flow control scheme limits the amount of outstanding data that may be in flight at any time (between the exit and the client) to 1000 cells, or about 500 KiB, per circuit. The same mechanism is used when data is flowing in the opposite direction (up from the client, through the entry and middle, and to the exit).
The Sniper Attack
The new Denial of Service (DoS) attack, which we call "The Sniper Attack", exploits the flow control algorithm to remotely crash a victim Tor relay by depleting its memory resources. The paper presents three attacks that rely on the following two techniques:
- the attacker stops reading from the TCP connection containing the attack circuit, which causes the TCP window on the victim's outgoing connection to close and the victim to buffer up to 1000 cells; and
- the attacker causes cells to be continuously sent to the victim (exceeding the 1000 cell limit and consuming the victim's memory resources) either by ignoring the package window at packaging end of the circuit, or by continuously sending SENDMEs from the delivery end to the packaging end even though no cells have been read by the delivery end.
Basic Version 1 (attacking an entry relay)
In basic version 1, the adversary controls the client and the exit relay, and chooses a victim for the entry relay position. The adversary builds a circuit through the victim to her own exit, and then the exit continuously generates and sends arbitrary data through the circuit toward the client while ignoring the package window limit. The client stops reading from the TCP connection to the entry relay, and the entry relay buffers all data being sent by the exit relay until it is killed by its OS out-of-memory killer.
Basic Version 2 (attacking an exit relay)
In basic version 2, the adversary controls the client and an Internet destination server (e.g. website), and chooses a victim for the exit relay position. The adversary builds a circuit through the victim exit relay, and then the client continuously generates and sends arbitrary data through the circuit toward the exit relay while ignoring the package window limit. The destination server stops reading from the TCP connection to the exit relay, and the exit relay buffers all data being sent by the client until it is killed by its OS out-of-memory killer.
Both of the basic versions of the attack above require the adversary to generate and send data, consuming roughly the same amount of upstream bandwidth as the victim's available memory. The efficient version reduces this cost by one to two orders of magnitude.
In the efficient version, the adversary controls only a client. She creates a circuit, choosing the victim for the entry position, and then instructs the exit relay to download a large file from some external Internet server. The client stops reading on the TCP connection to the entry relay, causing it to buffer 1000 cells.
At this point, the adversary may "trick" the exit relay into sending more cells by sending it a SENDME cell, even though the client has not actually received any cells from the entry. As long as this SENDME does not increase the exit relay's package counter to greater than 1000 cells, the exit relay will continue to package data from the server and send it into the circuit toward the victim. If the SENDME does cause the exit relay's package window to exceed the 1000 cell limit, it will stop responding on that circuit. However, the entry and middle node will hold the circuit open until the client issues another command, meaning its resources will not be freed.
The bandwidth cost of the attack after circuit creation is simply the bandwidth cost of occasionally sending a SENDME to the exit. The memory consumption speed depends on the bandwidth and congestion of non-victim circuit relays. We describe how to parallelize the attack using multiple circuits and multiple paths with diverse relays in order to draw upon Tor's inherent resources. We found that with roughly 50 KiB/s of upstream bandwidth, an attacker could consume the victim's memory at roughly 1 MiB/s. This is highly dependent on the victim's bandwidth capabilities: relays that use token buckets to restrict bandwidth usage will of course bound the attack's consumption rate.
Rather than connecting directly to the victim, the adversary may instead launch the attack through a separate Tor circuit using a second client instance and the "Socks4Proxy" or "Socks5Proxy" option. In this case, she may benefit from the anonymity that Tor itself provides in order to evade detection. We found that there is not a significant increase in bandwidth usage when anonymizing the attack in this way.
A simple but naive defense against the Sniper Attack is to have the guard node watch its queue length, and if it ever fills to over 1000 cells, kill the circuit. This defense does not prevent the adversary from parallelizing the attack by using multiple circuits (and then consuming 1000 cells on each), which we have shown to be extremely effective.
Another defense, called "authenticated SENDMEs", tries to protect against receiving a SENDME from a node that didn't actually receive 100 cells. In this approach, a 1 byte nonce is placed in every 100th cell by the packaging end, and that nonce must be included by the delivery end in the SENDME (otherwise the packaging end rejects the SENDME as inauthentic). As above, this does not protect against the parallel attack. It also doesn't defend against either of the basic attacks where the adversary controls the packaging end and ignores the SENDMEs anyway.
The best defense, as we suggested to the Tor developers, is to implement a custom, adaptive out-of-memory circuit killer in application space (i.e. inside Tor). The circuit killer is only activated when memory becomes scarce, and then it chooses the circuit with the oldest front-most cell in its circuit queue. This will prevent the Sniper Attack by killing off all of the attack circuits.
With this new defense in place, the next game is for the adversary to try to cause Tor to kill an honest circuit. In order for an adversary to cause an honest circuit to get killed, it must ensure that the front-most cell on its malicious circuit queue is at least slightly "younger" than the oldest cell on any honest queue. We show that the Sniper Attack is impractical with this defense: due to fairness mechanisms in Tor, the adversary must spend an extraordinary amount of bandwidth keeping its cells young — bandwidth that would likely be better served in a more traditional brute-force DoS attack.
Tor has implemented a version of the out-of-memory killer for circuits, and is currently working on expanding this to channel and connection buffers as well.
Hidden Service Attack and Countermeasures
The paper also shows how the Sniper Attack can be used to deanonymize hidden services:
- run a malicious entry guard relay;
- run the attack from Oakland 2013 to learn the current guard relay of the target hidden service;
- run the Sniper Attack on the guard from step 2, knocking it offline and causing the hidden service to choose a new guard;
- repeat, until the hidden service chooses the relay from step 1 as its new entry guard.
The technique to verify that the hidden service is using a malicious guard in step 4 is the same technique used in step 2.
In the paper, we compute the expected time to succeed in this attack while running malicious relays of various capacities. It takes longer to succeed against relays that have more RAM, since it relies on the Sniper Attack to consume enough RAM to kill the relay (which itself depends on the bandwidth capacity of the victim relay). For the malicious relay bandwidth capacities and honest relay RAM amounts used in their estimate, we found that deanonymization would involve between 18 and 132 Sniper Attacks and take between ~4 and ~278 hours.
This attack becomes much more difficult if the relay is rebooted soon after it crashes, and the attack is ineffective when Tor relays are properly defending against the Sniper Attack (see the "Defenses" section above).
Strategies to defend hidden services in particular go beyond those suggested here to include entry guard rate-limiting, where you stop building circuits if you notice that your new guards keep going down (failing closed), and middle guards, guard nodes for your guard nodes. Both of these strategies attempt to make it harder to coerce the hidden service into building new circuits or exposing itself to new relays, since that is precisely what is needed for deanonymization.
The main defense implemented in Tor will start killing circuits when memory gets low. Currently, Tor uses a configuration option (MaxMemInCellQueues) that allows a relay operator to configure when the circuit-killer should be activated. There is likely not one single value that makes sense here: if it is too high, then relays with lower memory will not be protected; if it is too low, then there may be more false positives resulting in honest circuits being killed. Can Tor determine this setting in an OS-independent way that allows relays to automatically find the right value for MaxMemInCellQueues?
The defenses against the Sniper Attack prevent the adversary from crashing the victim relay, but the adversary may still consume a relay's bandwidth (and memory resources, to a critical level) at relatively low cost. This means that even though the Sniper Attack can no longer kill a relay, it can still consume a large amount of its bandwidth at a relatively low cost (similar to more traditional bandwidth amplification attacks). More analysis of general bandwidth consumption attacks and defenses remains a useful research problem.
Finally, hidden services also need some love. More work is needed to redesign them in a way that does not allow a client to cause the hidden service to choose new relays on demand.
Together with Stefan, I recently published the paper "Spoiled Onions: Exposing Malicious Tor Exit Relays". The paper only discusses our results and how we obtained them and we don't talk a lot about the implications for Tor users. This blog post should fill that gap.
First, it's important to understand that 25 relays in four months isn't a lot. It is ultimately a very small fraction of the Tor network. Also, it doesn't mean that 25 out of 1,000 relays are malicious or misconfigured (we weren't very clear on that in the paper). We have yet to calculate the churn rate of exit relays which is the rate at which relays join and leave the network. 1,000 is really just the approximate number of exit relays at any given point in time. So the actual number of exit relays we ended up testing in four months is certainly higher than that. As a user, that means that you will not see many malicious relays "in the wild".
Second, Tor clients select relays in their circuits based on the bandwidth they are contributing to the network. Faster relays see more traffic than slower relays which balances the load in the Tor network. Many of the malicious exit relays contributed relatively little bandwidth to the Tor network which makes them quite unlikely to be chosen as relay in a circuit.
Third, even if your traffic is going through a malicious exit relay, it doesn't mean that everything is lost. Many of the attacks we discovered still caused Firefox' infamous "about:certerror" warning page. As a vigilant user, you would notice that something isn't quite right and hopefully leave the site. In addition, TorBrowser ships with HTTPS-Everywhere which by default attempts to connect to some sites over HTTPS even though you just typed "http://". After all, as we said in the past, "Plaintext over Tor is still plaintext".
Finally, we want to point out that all of these attacks are of course not limited to the Tor network. You face the very same risks when you are connecting to any public WiFi network. One of the fundamental problems is the broken CA system. Do you actually know all the ~50 organisation who you implicitly trust when you start your Firefox, Chrome, or TorBrowser? Making the CA system more secure is a very challenging task for the entire Internet and not just the Tor network.
Website traffic fingerprinting is an attack where the adversary attempts to recognize the encrypted traffic patterns of specific web pages without using any other information. In the case of Tor, this attack would take place between the user and the Guard node, or at the Guard node itself.
There are two models under which these attacks are typically studied: The "closed world" scenario, and the "open world" scenario. In the "closed world" scenario, the only traffic patterns the classifier ever sees are for web pages that it has already been trained on, and it typically must successfully label all of them. This is meant to simulate situations where users only use Tor for viewing a small set of censored web pages and nothing else. The "open world" scenario is slightly more realistic in that it attempts to examine the ability of the adversary to recognize a few censored pages out of a much larger set of uncensored pages, some of which it has never seen before.
It is important to note that in both models, these papers are reporting results on their ability to classify individual pages and not overall websites, despite often using the terms website and webpage interchangeably.
The most comprehensive study of the statistical properties of this attack against Tor was done by Panchenko et al. In their "closed world" study, they evaluated their success rates of classifying 775 web pages. In their "open world" study, they classified a handful of censored pages in 5000 page subsets of 1,000,000 total web pages.
Since then, a series of smaller scale follow-on attack papers have claimed improved success rates since Panchenko's study, and in at least two instances these papers even claimed to completely invalidate any attempt at defense based only on partial results.
While there may have been some improvements in classifier accuracy in these papers over Panchenko, defenses were "broken" and dismissed primarily by taking a number of shortcuts that ignore basic properties of machine learning theory and statistics in order to enable their claims of success.
Despite these subsequent improvements in these attacks, we are still skeptical of the efficacy of this attack in a real world scenario, and believe that only minimal defenses will be needed to ensure the attack continues to be unusable in practice.
This post will be divided into the following areas: First, we will attempt to distill the adversary model used by this body of work, and describe the adversary's goals. Then, we will review the basic properties of machine learning theory and statistics that govern classifier accuracy and real-world performance. Next, we will make these properties concrete by briefly discussing additional real-world sources of hidden complexity in the website traffic fingerprinting problem domain. We will then specifically enumerate both the useful contributions and the shortcuts taken by various work to date. Finally, we will conclude with suggestions and areas for improvement in future work.
Pinning Down the Adversary Model
Website traffic fingerprinting attack papers assume that an adversary is for some reason either unmotivated or unable to block Tor outright, but is instead very interested in detecting patterns of specific activity inside Tor-like traffic flows.
The exact motivation for this effort on behalf of the adversary is typically not specified, but there seem to be three possibilities, in order of increasing difficulty for the adversary:
- The adversary is interested in blocking specific censored webpage traffic patterns, while still leaving the rest of the Tor-like traffic unmolested (perhaps because Tor's packet obfuscation layer looks like something legitimate that the adversary wants to avoid blocking).
- The adversary is interested in identifying all of the users that visit a small, specific set of targeted pages.
- The adversary is interested in recognizing every single web page a user visits.
Unfortunately, it seems that machine learning complexity theory and basic statistics are heavily stacked against all three of these goals.
Theoretical Issues: Factors Affecting Classification Accuracy
Machine Learning theory tells us that the accuracy of a classifier is affected by four main factors: the size of the hypothesis space (ie the information-theoretic complexity of the classification categories), the accuracy of feature extraction and representation (the bias in the hypothesis space), the size of the instance space (the world size), and the number of training examples provided to the classifier.
Bounds have actually been established on the effect of each of these areas to the likelihood of achieving a given accuracy rate, and any good undergraduate course in Machine Learning will cover these topics in detail. For a concise review of these accuracy bounds, we recommend Chapters 5 and 7 of Machine Learning by Tom Mitchell.
The brief summary is this: as the number and/or complexity of classification categories increases while reliable feature information does not, the classifier eventually runs out of descriptive feature information, and either true positive accuracy goes down or the false positive rate goes up. This error is called the bias in the hypothesis space. In fact, even for unbiased hypothesis spaces, the number of training examples required to achieve a reasonable error bound is a function of the number and complexity of the classification categories.
It turns out that the effects of all of these factors are actually observable in the papers that managed to study a sufficient world size.
First, for the same world size and classifier technique, every work that examined both the open and closed worlds reports much higher accuracy rates for the open world than for the closed world. This is due to the higher hypothesis space complexity involved in labeling every page in the closed world, as opposed to labeling only a very small subset of censored targets in the open world. This confirms the above machine learning theory, and tells us that that an adversary that is attempting to recognize every single web page in existence is going to have a very difficult time getting any accuracy, as compared to one who is classifying only a select interesting subset.
It is also clearly visible in Panchenko's large-scale open world study that increasing the world size contributes to a slower, but still steady decline in open world accuracy in Figure 4 below. The effect of the hypothesis space complexity (increased number of pages to classify) on the open world can also be seen in the rising false positive rates of Figure 5 below (this rise may seem small, but in the next section we will show how even tiny increases in false positives are in fact devastating to the attack).
Every other attack paper since then has neglected to use world sizes large enough to observe these effects in sufficient detail, but again, machine learning theory tells us they are still there, just beyond the published data points.
Practical Issues: False Positives Matter. A Lot.
Beyond classifier accuracy and the factors that affect it, the practical applicability of the website traffic fingerprinting attack is also affected by some basic statistical results. These statistics (which have been examined in other computer security-related application domains of machine learning) show that false positives end up destroying the effectiveness of classification unless they are vanishingly small (much less than 10^-6).
It actually turns out that false positives are especially damaging to this attack, perhaps even more so than most other application domains. In the website traffic fingerprinting attack, false positive results are a built-in property of the world: if one or more pages' traffic patterns are similar enough to a target page's pattern to trigger a false positive, these sets of pages will always be misclassified as the target page every time that traffic pattern is present. Unlike end-to-end timing correlation, the adversary does not get to benefit from information derived from repeated visits (except in narrow, contrived scenarios that we will address in our literature review below).
What's more is that this also means that small world sizes directly impact the false positive rate. As you increase the world size, the likelihood of including a page that matches a traffic stream of your target page also increases.
To demonstrate the damaging effects of false positives on the attack, we will now consider the effects of false positives on the adversary's two remaining feasible goals: flagging users who visit a specific controversial web page over Tor, and censoring only a subset of Tor-like traffic.
In the event where the adversary is attempting to recognize visits to highly sensitive material (such as a specific web page about a particular protest action) and is interested in gathering a list of suspects who visit the web page, it is easy to see that even with very high accuracy rates, the suspect list quickly grows without bound.
To see this, note that the probability of at least one false positive over N independent page loads is given by 1 - (1-fp)^N, where fp is the probability of a false positive.
Even with a false positive rate as low as 0.2% (which is typical for this literature, and again is also lower than reality due to small world sizes), after performing just N=100 different page loads, 18% of the userbase will be falsely accused of visiting targeted material at least once. After each user has performed N=1000 different page loads, 86.5% of that user base will have been falsely accused of visiting the target page at least once. In fact, many of these users will be falsely accused much more frequently than that (as per the Binomial Distribution).
If instead of trying to enumerate specific visitors, the adversary is trying to interfere with the traffic patterns of certain web pages, the accuracy value that matters is the Bayesian Detection Rate, which is the probability that a traffic pattern was actually due to a censored/target page visit given that the detector said it was recognized. This is written as P(Censored|Classified).
Using Bayes Theorem, it is possible to convert from the true and false positive rates of P(Classified|Censored) and P(Classified|~Censored) to the Bayesian Detection Rate of P(Censored|Classified) like so:
P(Censored|Classified) = P(Classified|Censored)*P(Censored) / (P(Censored)*P(Classified|Censored) + P(~Censored)*P(Classified|~Censored))
Under conditions of low censorship (0.1% -- such as when the Tor traffic successfully blends in with a large volume of innocuous Internet traffic, or when Tor is used for both censorship circumvention and general privacy), with a true positive rate of 0.6 and a false positive rate of 0.005, we have:
P(Censored|Classified) = 0.6*.001/(.001*0.6+.999*.005) P(Censored|Classified) = 0.10 P(~Censored|Classified) = 1 - P(Censored|Classified) P(~Censored|Classified) = 0.90
This means that when a traffic pattern is classified as a censored/target page, there is a 90% chance that the classifier is actually telling the adversary to interfere with an unrelated traffic stream, and only a 10% chance that the classifier was actually correct.
This phenomenon was explored in detail in the Intrusion Detection System literature, and is the reason why anomaly and classification-based antivirus and IDS systems have failed to materialize in the marketplace (despite early success in academic literature).
Practical Issues: Multipliers of World Size are Common
Beyond the above issues, there are a number of additional sources of world size and complexity in the website traffic fingerprinting problem domain that are completely unaddressed by the literature.
Again, it is important to remember that despite the use of the word "website" in their titles, these attacks operate on classifying instances of traffic patterns created by single pages, and not entire sites. For some sites, there may be little difference between one page and another. For many sites, the difference between component pages is significant.
It is also important to note that the total number of pages on the web is actually quite larger than the number of items indexed by Google (which is quite a bit larger than even the 1,000,000 page crawl used by Panchenko). In particular, the effects of dynamically generated pages, unindexed pages, rapidly updated pages, interactive pages, and authenticated webapps have been neglected in these studies, and their various possible traffic patterns contribute a very large number of additional web traffic patterns to the world, especially when the different manners in which users interact with them are taken into account.
In addition, similarities between non-web and web traffic also complicate the problem domain. For just one example, it is likely that an open tab with a Twitter query in it generates similar traffic patterns to a Tor-enabled XMPP or IRC client.
All of these factors increase the complexity of the hypothesis space and the instance space, which as we demonstrated above, will necessarily reduce the accuracy of the attack in both theory and practice.
Literature Review: Andriy Panchenko et al
As mentioned previously, Panchenko's work (actually the first work to successfully apply website traffic fingerprinting to Tor) is still the most comprehensive study to date.
Here is a brief list of its strengths:
- The world sizes are huge.
5000 page subsets of 1,000,000 pages may still have representational issues compared to the real world, but this is still the largest study to date.
- Careful feature extraction (hypothesis space construction).
The reason why this work succeeded where earlier works failed is because rather than simply toss raw data into a machine learning algorithm, they were extremely careful with the representation of data for their classifiers. This likely enabled both the efficiency that allowed such a large world size, and reduced the bias in the hypothesis space that would otherwise lead to low accuracy and high false positives at such large world sizes.
- In the "open world", they varied the type of censored target sites to evaluate accuracy.
Instead of merely picking the target pages that were easiest to classify (such as video sites), they varied the type of target pages and explored the effects on both true and false positive accuracy. None of the other literature to date has examined this effect in detail.
- They are careful to tune their classifiers to minimize false positives.
Modern classifiers typically allow you to trade off between false positives and false negatives. Given how deeply false positives impact the adversary's goals, this is very important.
- They demonstrate knowledge of all of the factors involved in PAC theory and practical classifier accuracy.
All of the other papers to date simply omit one or more of the following: data on feature extraction, the contribution of individual features to accuracy, the number of training examples, the effect of target size on both the open and closed world, the effect of target page types on accuracy, and the effects of world size on accuracy.
While impressive, it does miss a few details:
- It fails to acknowledge that false positives are a property of specific sites.
In reality, nearly every false positive they experienced in a given subset of 5000 pages is a false positive that will likely appear in a classifier that is run against the entire 1,000,000 page dataset. They probably should have tallied the total false positives over multiple runs for this reason, and analyzed their properties.
- It still fails to provide a realistic adversary model in order to justify claims that the attack actually accomplishes what the adversary wants.
Literature Review: Kevin P. Dyer et al
This work attempts to evaluate several defenses against the website traffic fingerprinting attack, and also attempts to evaluate an "idealized best case" defense called BuFLO.
On the positive side, the work makes the following contributions:
- It creates an ideal high-overhead tunable defense (BuFLO) that can be compared against other practical defenses.
- It is quite clear about its definitions of accuracy and experimental setup.
- Source code is provided!
However, the work had the following rather serious shortcomings:
- Close world only, and a very small closed world at that.
This is the single largest issue with this attack paper. You cannot claim that all defenses are broken forever if a classifier is only able to somewhat correctly classify 500 pages or less (and only 128 pages in their defense studies!), even in a closed world.
- Exaggerated claims give no credit to (or detailed analysis of) relative strengths of defenses.
Despite the fact that their exaggerated claims are made possible due to their small world size of only 128 pages, the authors do not acknowledge that some light weight defenses did in fact do far better than others in their experiments. In light of the fact that their world size was woefully small, the authors work would have been substantially more humble in terms of acknowledging the effectiveness of some defenses, and should have spent more time focusing on why some defenses did better against some classifiers, what the overhead costs were, and what classifier features were most impacted by each defense.
- It does not give statistics on which sites suffer high rates of misclassification.
Even in their small world size, is it likely that they stumbled upon a few pages that ended up consistently misclassified. Which were these? How often does that happen out of arbitrary 128 page subsets?
- Features are evaluated, but not in terms of their contribution to accuracy.
There are several standard ways of evaluating the contribution of features to accuracy under various conditions. The authors appeared to employ none of them.
Literature Review: Xiang Cai et al
This work also attempts to evaluate several additional defenses against the website traffic fingerprinting attack, proposes a Hidden Markov Model extension of the attack to classify web sites in addition to pages, and also attempts to evaluate the BuFLO defense from the "Peek-a-boo" paper, as well as a lower-overhead variant.
On the positive side, the following useful contributions were made:
- A low-resource "congestion sensitive" version of Dyer et al's BuFLO defense is presented.
This defense is also tunable, and deserves a more fair evaluation than was given to it by its own creators. Hopefully they will follow up on it.
- An automatic feature extraction mechanism is built in to the classifier.
The work uses an edit distance algorithm to automatically extract features (pairwise transposition, insertion, deletions and substitutions) rather than manual specification and extraction.
However, this paper also shares many of the Peek-a-boo paper's shortcomings above, and has some of its own.
- Minimal edit distance classifiers are ideally suited for a closed world.
The DLSVM classifier finds the shortest edit distance between a single testing instance and the model instances from training. This is a useful heuristic for a closed world, where you know that your training instance will match *something* from your model, but how well does it fare when you have to tune a cutoff to decide when a given traffic stream's edit distance is too large to be among your censored target pages? It seems likely that it is still better than Panchenko's classifier in the open world, but this has not been proven conclusively, especially since the effect of site uniqueness on the classifier was not examined.
- Glosses over effects of defenses on edit distance components.
Even though edit distance classifiers do not have explicit features, they still have implicit ones. It would be useful to study the effect of low-resource defenses on the most information-heavy components of the edit distances. Or better: to analyze how to tailor their BuFLO variant to target these components in an optimal way.
- Defenses were not given a fair analysis.
In the specific case of both Tor Browser's pipeline defense and HTTPOS, an analysis of the actual prevalence of pipelining and server side support for the HTTPOS feature set was not performed. Nor were statistics on actual request combination and reordering given, nor were the effects on the feature components analyzed.
- Navigation-based Hidden Markov Models are user-specific, and will also suffer from Garbage-In Garbage-Out false positive properties.
While a Hidden Markov Model would appear to provide increased accuracy by being able to utilize multiple observations, the reality of web usage is that users' navigation patterns are not uniform and will vary by individual and social group, especially for news/portal sites and for social networks. For example, people often navigate off of Twitter to view outside links, and some people share more links than others. It is quite likely that the HMM will consider any link portal (or any repetitive background network activity) that has an occasional false positive match for Twitter to be mistaken for such intermittent Twitter usage.
Literature Review: Tao Wang and Ian Goldberg
This paper improves upon the work by Cai et al by statistically removing Tor flow control cells, improving the training performance of the classifier, and by performing a small-scale open world study of edit distance based classifiers. It also provides some limited analysis of difficult to classify sites.
The paper has the following issues:
- A broken implementation of Tor Browser's defense was studied, despite warnings to the contrary.
Last year, we discovered serious issues with the HTTP Pipeline randomization defense that were introduced during the transition from Firefox 4 to Firefox 10, and that other issues may have been present in the Firefox 4 version as well. These issues were corrected during the transition to Firefox 17, and the pipeline randomization defense was vastly improved, but the authors still chose to evaluate the broken version, despite our offers for assistance. Moreover, like previous work, an analysis of the actual prevalence of server-side pipelining support and request combination and reordering was not performed.
- Small world size.
This paper studies a smaller closed world size than the previous edit distance based work (100 pages instead of 1000), and uses only 1000 pages for its open world study.
- Target site types were not varied in the open world.
Unlike Panchenko's work, the types of sites chosen as censored targets were fixed, not varied. This makes it hard to evaluate the effects of types of sites with either very distinct or more typical traffic patterns on the accuracy of their classifier.
Source code is provided, however, so it may be possible to revisit and correct some of these issues, at least.
Concluding Remarks: Suggestions for Future Work
This post is not meant to dismiss the website traffic fingerprinting attack entirely. It is merely meant to point out that defense work has not been as conclusively studied as these papers have claimed, and that defenses are actually easier than is presently assumed by the current body of literature.
We believe that the theoretical and practical issues enumerated in this post demonstrate that defenses do not need to be terribly heavy-weight to be effective. It is likely that in practice, relatively simple defenses will still substantially increase the false positive rate of this attack when it is performed against large numbers of web pages (and non-web background traffic), against large volumes of Tor-like traffic, against large numbers of users, or any combination thereof.
We therefore encourage a re-evaluation of existing defenses such as HTTPOS, SPDY and pipeline randomization, and Guard node adaptive padding, Traffic Morphing, as well as the development of additional defenses.
It is possible that some types of pages (especially those on video sites, file locker sites, and sites with very large content elements) may make natural false posties rare, especially when classified among smaller, more typical pages. For instance, it is unlikely to be possible to generate false positives when the classifier needs only to distinguish between pages from Wikipedia versus Youtube, but it is likely that generic large content and video downloads can be obfuscated such that it is hard to recognize the specific video or download site with minimal relative overhead.
As in the work done by Panchenko, we also suggest evaluating each of your classifier's explicit or implicit features for their relative contribution to overall accuracy (especially among similar classes of sites) as this will help guide the padding and obfuscation behaviors of any defenses you or others might devise. The high-information component features or edit distances can be analyzed and used as input to a statistically adaptive padding defense, for example.
In any event, please do contact us if you're interested in studying these or other defenses in Tor Browser or Tor itself.
If you are not interested in helping improve the defenses of Tor, do not have time to perform such evaluations in a thorough manner, or have already previously published a website traffic fingerprinting attack paper yourself, we request that you publish or at least provide us with the source code and the data sets involved in your attack, so that we or others can reproduce your results and evaluate potential defenses as well.
Concluding Remarks: Dual Purpose Defenses
It turns out that some defenses against the website traffic fingerprinting attack are also useful against the end-to-end correlation attack. In particular, defenses that increase the rate of false positives of website traffic fingerprinting using padding and other one-ended schemes is very likely to increase the rate of false positives for end-to-end correlation, especially under situations where either limited record-keeping capacity, sample-based analysis, or link-level encryption reduce connection and inter-packet timing information.
If a defense introduces false positive between many different web pages in website traffic fingerprinting by manipulating the traffic patterns at the entrance of the Tor network but not the exit, it will be very likely to introduce false positives during correlation between the entrance and exit traffic of simultaneous downloads of these same pages. As in website traffic fingerprinting, it turns out that even a small amount of false positives will frustrate a dragnet adversary attempting end-to-end correlation.
End-to-end correlation is a much more difficult problem, and we may not ever solve the problem of repeated observations. However, we likely can increase the number and duration of successful observations that correlation attacks will require to build high confidence, especially if the user base grows very large, and if the web moves to higher adoption rates of TLS (so that HTTP identifiers are not available to provide long-term linkability at exit nodes).