Tor security advisory: "relay early" traffic confirmation attack

This advisory was posted on the tor-announce mailing list.

SUMMARY:

On July 4 2014 we found a group of relays that we assume were trying to deanonymize users. They appear to have been targeting people who operate or access Tor hidden services. The attack involved modifying Tor protocol headers to do traffic confirmation attacks.

The attacking relays joined the network on January 30 2014, and we removed them from the network on July 4. While we don't know when they started doing the attack, users who operated or accessed hidden services from early February through July 4 should assume they were affected.

Unfortunately, it's still unclear what "affected" includes. We know the attack looked for users who fetched hidden service descriptors, but the attackers likely were not able to see any application-level traffic (e.g. what pages were loaded or even whether users visited the hidden service they looked up). The attack probably also tried to learn who published hidden service descriptors, which would allow the attackers to learn the location of that hidden service. In theory the attack could also be used to link users to their destinations on normal Tor circuits too, but we found no evidence that the attackers operated any exit relays, making this attack less likely. And finally, we don't know how much data the attackers kept, and due to the way the attack was deployed (more details below), their protocol header modifications might have aided other attackers in deanonymizing users too.

Relays should upgrade to a recent Tor release (0.2.4.23 or 0.2.5.6-alpha), to close the particular protocol vulnerability the attackers used — but remember that preventing traffic confirmation in general remains an open research problem. Clients that upgrade (once new Tor Browser releases are ready) will take another step towards limiting the number of entry guards that are in a position to see their traffic, thus reducing the damage from future attacks like this one. Hidden service operators should consider changing the location of their hidden service.

THE TECHNICAL DETAILS:

We believe they used a combination of two classes of attacks: a traffic confirmation attack and a Sybil attack.

A traffic confirmation attack is possible when the attacker controls or observes the relays on both ends of a Tor circuit and then compares traffic timing, volume, or other characteristics to conclude that the two relays are indeed on the same circuit. If the first relay in the circuit (called the "entry guard") knows the IP address of the user, and the last relay in the circuit knows the resource or destination she is accessing, then together they can deanonymize her. You can read more about traffic confirmation attacks, including pointers to many research papers, at this blog post from 2009:
https://blog.torproject.org/blog/one-cell-enough

The particular confirmation attack they used was an active attack where the relay on one end injects a signal into the Tor protocol headers, and then the relay on the other end reads the signal. These attacking relays were stable enough to get the HSDir ("suitable for hidden service directory") and Guard ("suitable for being an entry guard") consensus flags. Then they injected the signal whenever they were used as a hidden service directory, and looked for an injected signal whenever they were used as an entry guard.

The way they injected the signal was by sending sequences of "relay" vs "relay early" commands down the circuit, to encode the message they want to send. For background, Tor has two types of cells: link cells, which are intended for the adjacent relay in the circuit, and relay cells, which are passed to the other end of the circuit. In 2008 we added a new kind of relay cell, called a "relay early" cell, which is used to prevent people from building very long paths in the Tor network. (Very long paths can be used to induce congestion and aid in breaking anonymity). But the fix for infinite-length paths introduced a problem with accessing hidden services, and one of the side effects of our fix for bug 1038 was that while we limit the number of outbound (away from the client) "relay early" cells on a circuit, we don't limit the number of inbound (towards the client) relay early cells.

So in summary, when Tor clients contacted an attacking relay in its role as a Hidden Service Directory to publish or retrieve a hidden service descriptor (steps 2 and 3 on the hidden service protocol diagrams), that relay would send the hidden service name (encoded as a pattern of relay and relay-early cells) back down the circuit. Other attacking relays, when they get chosen for the first hop of a circuit, would look for inbound relay-early cells (since nobody else sends them) and would thus learn which clients requested information about a hidden service.

There are three important points about this attack:

A) The attacker encoded the name of the hidden service in the injected signal (as opposed to, say, sending a random number and keeping a local list mapping random number to hidden service name). The encoded signal is encrypted as it is sent over the TLS channel between relays. However, this signal would be easy to read and interpret by anybody who runs a relay and receives the encoded traffic. And we might also worry about a global adversary (e.g. a large intelligence agency) that records Internet traffic at the entry guards and then tries to break Tor's link encryption. The way this attack was performed weakens Tor's anonymity against these other potential attackers too — either while it was happening or after the fact if they have traffic logs. So if the attack was a research project (i.e. not intentionally malicious), it was deployed in an irresponsible way because it puts users at risk indefinitely into the future.

(This concern is in addition to the general issue that it's probably unwise from a legal perspective for researchers to attack real users by modifying their traffic on one end and wiretapping it on the other. Tools like Shadow are great for testing Tor research ideas out in the lab.)

B) This protocol header signal injection attack is actually pretty neat from a research perspective, in that it's a bit different from previous tagging attacks which targeted the application-level payload. Previous tagging attacks modified the payload at the entry guard, and then looked for a modified payload at the exit relay (which can see the decrypted payload). Those attacks don't work in the other direction (from the exit relay back towards the client), because the payload is still encrypted at the entry guard. But because this new approach modifies ("tags") the cell headers rather than the payload, every relay in the path can see the tag.

C) We should remind readers that while this particular variant of the traffic confirmation attack allows high-confidence and efficient correlation, the general class of passive (statistical) traffic confirmation attacks remains unsolved and would likely have worked just fine here. So the good news is traffic confirmation attacks aren't new or surprising, but the bad news is that they still work. See https://blog.torproject.org/blog/one-cell-enough for more discussion.

Then the second class of attack they used, in conjunction with their traffic confirmation attack, was a standard Sybil attack — they signed up around 115 fast non-exit relays, all running on 50.7.0.0/16 or 204.45.0.0/16. Together these relays summed to about 6.4% of the Guard capacity in the network. Then, in part because of our current guard rotation parameters, these relays became entry guards for a significant chunk of users over their five months of operation.

We actually noticed these relays when they joined the network, since the DocTor scanner reported them. We considered the set of new relays at the time, and made a decision that it wasn't that large a fraction of the network. It's clear there's room for improvement in terms of how to let the Tor network grow while also ensuring we maintain social connections with the operators of all large groups of relays. (In general having a widely diverse set of relay locations and relay operators, yet not allowing any bad relays in, seems like a hard problem; on the other hand our detection scripts did notice them in this case, so there's hope for a better solution here.)

In response, we've taken the following short-term steps:

1) Removed the attacking relays from the network.

2) Put out a software update for relays to prevent "relay early" cells from being used this way.

3) Put out a software update that will (once enough clients have upgraded) let us tell clients to move to using one entry guard rather than three, to reduce exposure to relays over time.

4) Clients can tell whether they've received a relay or relay-cell. For expert users, the new Tor version warns you in your logs if a relay on your path injects any relay-early cells: look for the phrase "Received an inbound RELAY_EARLY cell".

The following longer-term research areas remain:

5) Further growing the Tor network and diversity of relay operators, which will reduce the impact from an adversary of a given size.

6) Exploring better mechanisms, e.g. social connections, to limit the impact from a malicious set of relays. We've also formed a group to pay more attention to suspicious relays in the network:
https://blog.torproject.org/blog/how-report-bad-relays

7) Further reducing exposure to guards over time, perhaps by extending the guard rotation lifetime:
https://blog.torproject.org/blog/lifecycle-of-a-new-relay
https://blog.torproject.org/blog/improving-tors-anonymity-changing-guar…

8) Better understanding statistical traffic correlation attacks and whether padding or other approaches can mitigate them.

9) Improving the hidden service design, including making it harder for relays serving as hidden service directory points to learn what hidden service address they're handling:
https://blog.torproject.org/blog/hidden-services-need-some-love

OPEN QUESTIONS:

Q1) Was this the Black Hat 2014 talk that got canceled recently?
Q2) Did we find all the malicious relays?
Q3) Did the malicious relays inject the signal at any points besides the HSDir position?
Q4) What data did the attackers keep, and are they going to destroy it? How have they protected the data (if any) while storing it?

Great questions. We spent several months trying to extract information from the researchers who were going to give the Black Hat talk, and eventually we did get some hints from them about how "relay early" cells could be used for traffic confirmation attacks, which is how we started looking for the attacks in the wild. They haven't answered our emails lately, so we don't know for sure, but it seems likely that the answer to Q1 is "yes". In fact, we hope they *were* the ones doing the attacks, since otherwise it means somebody else was. We don't yet know the answers to Q2, Q3, or Q4.

Yeah? and who is "them"?

Research? That's just a guess. Tor guys don't know who and they don't know why.

"Together these relays summed to about 6.4% of the Guard capacity in the network. "

Does that sound like something like Joe Blow could afford? Because the presentation that never was, talked about doing this for 3k. But they didn't even have that. I don't think you run all these boxes for 6months for that little anyways. I don't think 'researchers' ie guys in their basement throw thousands of dollars at something so they can write a pdf.

Those network ranges coincide with fdcservers. Looking at the prices, yes they could get 116 servers on fast connections for 3k. Whois says they're out of Chicago which sounds like a researcher might use (centralized, not hiding their tracks, etc)

$30/mo * 115 VPSs * 5 months = $17k. Totally within the budget of some research group who decided that was a good use of their money.

The larger the Tor network gets (in terms of capacity), the more expensive it is to sign up a given fraction of it. Alas, bandwidth prices are very different depending on where the relay is, so getting good diversity is more expensive. I'm glad Hart Voor Internetvrijheid and other groups are working hard at the location diversity goal even though it's more expensive:
https://www.torservers.net/partners.html

See "fix 4" on
https://blog.torproject.org/blog/improving-tors-anonymity-changing-guar…
for more discussions on the topic of growing the total network capacity as a defense against these sorts of attacks.

One offer from this ISP is
VPS Special 3
1. 50Mbps unmetered
2. 1GB RAM
3. 150GB HDD
4. 5 IP Addresses
5. 2CPU Core
$31.90

Note this are 5 IPs per ~ $30.
115 / 5 = 23
23 * $30 ~ $600
5 months ~ $3000
The number from the canceled BH talk.

50 megabits divided by 5 isn't enough per relay to handle much capacity.

Also, 1 gigabyte of ram divided by 5 relays means you'll run out of memory right quick if you're trying to push a lot of bytes.

I assume that the $3k number was for a month, and they were planning to say something like "in the first month we ran 6% of the network and became the entry guard for 6% of the users, look it works".

In any case, we can speculate about how to make the numbers add up, or if they ever even did, but it's pretty much moot now -- and whether it's $3k or $27k doesn't really matter.

We know exactly who "them" is. Their names are Alexander Volynkin and Michael McCord (https://img.4plebs.org/boards/pol/image/1404/73/1404736805983.png) and they are researchers/students affiliated with Carnegie-Mellon University (http://www.reuters.com/article/2014/07/21/cybercrime-conference-talk-id… and http://www.theregister.co.uk/2014/07/22/legal_wrecking_balls_break_budg…).

Now, could one or more of our US-based colleagues please kindly FOIA the hell out of CMU on behalf of the community, please? Thanks!

Not viable under US law. CMU being non-governmental is not subject to FOIA. And any request directed to a government agency would be stiff-armed on a secrecy basis.

This is not "Joe Blow". CERT is one of the most well-funded computer security research organizations in the country. 30K, let alone 3K, is easily within the budget of the powers that be, if they feel it's worth spending that much. It's also easily within the budget of external funding providers (I'm sure your conspiracy theory-oriented brain could come up with some plausible ones).

These are not "guys in their basement". They are researchers in their well-funded computer lab.

And for those who do not follow general infosec issues closely, CMU/(US)CERT has a very close collaboration and funding association with US Homeland Security. Which means that it is nearly certain that all of the results of this research attack have been passed on to NSA.

lalala

July 30, 2014

Permalink

Thank you for the comprehensive write-up.

Is there anything that users can do to check they have been affected (e.g. have been using a bad guard?). For example by examining the Data/Tor/state file in their TBB directory?

Good thinking. Yes, this should work. It won't tell you if you used them in the past (and then discarded them), but it will tell you if they're in your recent set.

Grab a copy of e.g. https://collector.torproject.org/archive/relay-descriptors/server-descr… and then pull out the relays with nickname Unnamed running Tor 0.2.4.18-rc in the two /16 netblocks I described.

Once you've done that, maybe put the set of fingerprints on a paste bin or something so other people can use them too?

> maybe put the set of fingerprints on a paste bin or something so other people can use them too?

fwiw, these seem to be the fingerprints in question: http://ravinesmp.com/volatile/tor_relay_early_nodes.csv (CSV file generated from http://paste.debian.net/112652/ )

Note that there are 116 (not 115) nodes in this list. (Obviously don't trust this information, etc.; ideally one would reproduce it from collector.torproject.org or elsewhere.)

Here's the full message to check for in the Tor log:

"Received an inbound RELAY_EARLY cell on circuit %u."
" Closing circuit. Please report this event,"
" along with the following message.",
followed by a list of the relays in your circuit.

lalala

July 30, 2014

Permalink

Where can one find the identity key fingerprints for the removed 50.7.0.0/16 and 204.45.0.0/16 relays?

I'd like to scan backups of my /var/lib/tor/data/state file and see if I had one of those relays for a guard. (Although the real question of course is, was one of them my last hop...)

lalala

July 30, 2014

Permalink

Can Tor users check if they've been using one of the guards in the ranges that were removed from the network, or would those guard entries have been immediately removed from the client's state file upon learning that they'd been declared invalid?

(Of course, knowing that one *wasn't* using one of these guards would *not* mean you weren't affected, but it would still be interesting to know.)

lalala

July 30, 2014

Permalink

I wonder how many people have obtained the attacker's data! Locations of a lot of hidden services and their users would be quite interesting to many people - operators of hidden services are quite diverse and even include hardened criminals like the GCHQ's JTRIG hacking/trolling department: https://firstlook.org/theintercept/2014/07/14/manipulating-online-polls… (their catalog of capabilities include several that use, rather than attack, Tor hidden services).

If the attacker is the CMU researchers and law enforcement seizes their data to selectively prosecute certain hidden services, perhaps that data could also be used to investigate and litigate against JTRIG? Sadly though, we probably would not hear about it if such seizure happens since everything would be parallel constructed for the public case... unless the researchers decide to tell us (which would probably be violating an NSL or something).

"If the attacker is the CMU researchers and law enforcement seizes their data to selectively prosecute certain hidden services" - seems like that would be fruit of the poisonous tree, but I am not a lawyer

Don't be silly! They don't need the researchers for that.

The 'NSA' was using MIT (and others) to seed TOR for the SOD. Notice how the FOIA leaves the SOD out, they often masquerade behind the DEA title. Despite early news reports (circa 2009) it wasn't the DEA that busted Viktor Bout it was the SOD (I think it was a time article in 2011).

Mudrock also got the Hemisphere FOIA from LAPD or TacPD combine that with the telecom immunity act of 9/2007.

lalala

July 30, 2014

Permalink

Should serve as an example for other researchers of how not to go about things. Thanks for the good work patching the vulnerability and writing it up - if I knew you IRL I would buy you a beer.

lalala

July 30, 2014

Permalink

The network IP blocks 50.7.0.0/16 and 204.45.0.0/16 are assigned to a U.S. provider.

Further on I relate to the hidden service explanation at
https://www.torproject.org/docs/hidden-services.html.en

If U.S. IP blocks are excluded in the torrc via 'ExcludeNodes {US}'
can there be any point in the connection from client to hidden service (rendevouz point, introduction point) that could be on U.S. IP blocks notwithstanding?

Does the rendevouz point know it connects the anonymous client to a specific hidden service, does it know the servers .onion address?

Does 'DB' in the graphics on the explanation page stand for a Hidden Service Directory server? Why is 'DB' not drawn within the Tor cloud?

ExcludeNodes cannot be applied to HSDir selection because clients need to be able to construct the same list of HSDirs as the publisher (service) so that they can find the place where the descriptor is published.

If you had ExcludeNodes {US} and the two blocks listed are indeed identified as US by tor's geoip data source (I haven't checked about that), then at least you won't have one of them as a guard. But any other guard could also potentially be passively decoding the signals sent by the malicious HSDirs.

lalala

July 30, 2014

Permalink

It is, for me, not really clear what data the attacker actually got from users.

Suppose someone used a clean install of tails and visited a hidden service site. What do "they" know about the user?

You wrote: "but the attackers likely were not able to see any application-level traffic (e.g. what pages were loaded or even whether users visited the hidden service they looked up)"

Does this mean, the have the ip address of the user but not the page he has actually visited?

I'm pretty curious right now.

Thanks for answer!

The attacks observed were coming from HSDirs, which know the address of the hidden service they're serving a descriptor for. The message transmitted was the hidden service address. This message can be decoded by the guard, which knows the IP of the client (which is accessing or publishing the descriptor).

So, when the attacker is a hidden service's HSDir (which will probably happen eventually, as the position in the DHT rotates at some interval - it would be good to know how long it takes to cycle through 50% of the HSDirs) the guards for the hidden service can deanonymize it - meaning, they can link its IP address with its onion address. Clients using a malicious guard can also be deanonymized (their IP can be identified as one which accessed the service).

It is entirely possible that other guards which are not in the set of nodes mentioned above (and/or not controlled by the attacker running the nodes caught doing the active part of the attack) are or were also decoding these messages.

The same attack could also deanonymize non-hidden-service traffic if these messages were sent from exit nodes. There have not (yet) been exit nodes observed sending relay_early cells backwards.

Thank you for explanation,

you wrote: "their (the clients) IP can be identified as one which accessed the service".
Do you mean they know what specific hidden site the client has visited (worst case one can imagine!!), or do they only know that the client accessed the hidden service generally?

Just asking because they said in the article: "but the attackers likely were not able to see any application-level traffic (e.g. what pages were loaded or even whether users visited the hidden service they looked up)"

Thanks again!

The attacker, if his relays are in the right place, could learn that you (that is, your IP address) did a lookup for the hidden service address (e.g. duskgytldkxiuqc6.onion). But he won't learn whether you actually loaded the page in your browser. He also won't learn whether you visited http://duskgytldkxiuqc6.onion/comsense.html or http://duskgytldkxiuqc6.onion/fedpapers/federa00.htm or what.

Hope that helps.

That is he wont know directly. Since he is also your entry gaurd, he can watch the traffic over the circuit and get a good idea of how much traffic is passing, combined with knowing the site, could very well let him figure out some of those details.

Even if he didn't, it would likely be enough information over time to seperate casual observers who find a site and check it out, from serious users who may be more interesting targets.

Unless this is pure research, then I would assume this is not the end game but simply helping troll for targets.

Good point. The result of the attack in this advisory is that he knows which hidden service you looked up or published about. It's totally possible that he would then go on to do some other attack, like the website fingerprinting one you hint about, or just general "gosh she's using Tor a lot" observations.

Speaking of website fingerprinting, first read
https://blog.torproject.org/blog/critique-website-traffic-fingerprintin…
and then read "I Know Why You Went to the Clinic" from
https://www.petsymposium.org/2014/program.php
and finally, I hear there will be another website fingerprinting research paper at CCS this year (showing that false positive rates on realistic data are indeed higher than originally suspected).

"The attacker, if his relays are in the right place, could learn that you (that is, your IP address) did a lookup for the hidden service address (e.g. duskgytldkxiuqc6.onion)."

It's funny to think that some FBI ip could appear among the one they de-anonymize.
Duh, the FBI regulary infiltrate cp, drugs hidden service. (Think Silk Road, the FBI had several accounts on it from the beginning).
I know, it doesn't change anything but it makes me smile.

lalala

July 30, 2014

Permalink

Anonymity cannot be free.

Setup a exit relay on your dedicated server and use StrictNodes. This will make you invulnerable from this attack.

lalala

July 30, 2014

Permalink

Thank you for an informative post and for releasing a timely fix!

I have a quick release coordination question. Why wasn't the version of tor in TBB also bumped up, especially given how recently TBB 3.6.3 was released? Doesn't the current release cycle gap between TBB and tor potentially increase the likelihood that .22 (mostly TBB client) users will be distinguished from .23 (mostly relay) users?

I know it's not necessarily preferred/ideal practice, but some people run relays from TBB instances. I certainly have in the past...so if you agree with the sentiment, it might even be a good idea to append a notice to the most recent TBB blog post discouraging people from configuring TBB's tor as a relay until it gets bumped up to .23.

Perhaps I'm making too big a deal of this, and TBB 3.6.4 is already on its way...

Yeah, the coordination didn't go as smoothly as it could have.

The new Firefox in TBB 3.6.3 was urgent to get out, since it includes the usual raft of fixes for Firefox vulnerabilities:
https://www.mozilla.org/security/known-vulnerabilities/firefoxESR.html#…

Whereas the new Tor release isn't urgent for clients, since it only 1) adds a log entry (and the interface letting TBB users read their log lines sure isn't easy to use) and 2) prepares them to move from 3 guards to 1, but only once we set the consensus parameter to instruct them to switch (which I plan to do after the next TBB has been out for a while).

Hopefully it won't be too long until there's a TBB with the next Tor stable in it. But the TBB team have been working hard on a TBB 4.0 alpha, which will include the Tor 0.2.5.x tree, and I sure want to see that too. So much to do, and not enough people working on it all!

I would love to help on this sort of front as a volunteer, but this type of issue--as minor as I hope it will turn out to be--seems pretty staff-driven in terms of progress. So in terms of inviting volunteers to help, it unfortunately seems like one of the few areas where volunteers working mostly in *other* areas would be the precondition to open up staff bandwidth to address this type of issue.

lalala

July 30, 2014

Permalink

they signed up around 115 fast non-exit relays, all running on 50.7.0.0/16 or 204.45.0.0/16. Together these relays summed to about 6.4% of the Guard capacity in the network.

Do they sign these relays at once or were the number growing gradually? If at once wouldn't that be an alert signal?

You mentioned in your write-up that while their signing up all those relays triggered warnings in DocTor, they were left in the consensus since it was felt that they weren't too significant a portion of the network.

Hypothetically, what would be enough to make the authority operators say "Hey, these guys are bad news, let's get them out of the consensus ASAP"? Maybe this is a dumb question, and it depends on specific circumstances I don't know enough about.

Thanks for your tireless efforts as always!

Yeah. That's still not entirely resolved. I hope the answer is "it will take a lot less the next time we see it!" :)

But really, I worry about more subtle attacks, where the adversary signs up a few relays at a time over the course of months. Also, there are events (like the blocking in Egypt) where many people decide to run relays at once. If the adversary signs up a pile of relays on that day, will we say "oh good, people care about security and safety"?

The attack was possible to notice this time because the relays all came from the same netblocks at around the same time and running the same (kind of old by now) Tor version. Detecting this sort of attack in the general case seems like a really hard research problem.