Quick Summary of recent traffic correlation using netflows

by phobos | November 24, 2014

Here’s what you need to know about the recent research study on traffic correlation attacks:

While it’s great to see more research on traffic correlation attacks, this is not a new area of research. This is one study on the subject in a controlled environment using one readily available traffic monitoring technology to analyze Tor traffic. The researcher has clarified in the media that it was only 81.4 percent of their experiments not “81 percent of all Tor traffic” as has been reported elsewhere.

The Tor network provides anonymity by routing the user’s information through multiple servers (usually three) so that it is hard to detect the person’s physical location.

Tor protects users by:
1) encryption to ensure privacy of data within the Tor network,
2) authentication so clients know they're talking to the relays they meant to talk to, and
3) signatures to make sure all clients know the same set of relays.

In theory it may be possible to track Tor users by linking up their entry and exit points on the network but it is generally very difficult to do so. The Tor network design, however, does not protect against a targeted attack by a global passive adversary (such as the NSA) intent on figuring out whom to investigate through watching and measuring Tor traffic going into and out of the network and correlating the information on both sides. We encourage you to learn more about what Tor does provide.

Tor is used by 2.5 million people a day including the general public, journalists, companies, activists, military, and law enforcement and is a very safe, reliable way to protect your privacy on the Internet.

Comments

Please note that the comment area below has been archived.

November 24, 2014

Permalink

Thanks for that clarification!

People really need to understand the threat model Tor has in mind. Even if Tor doesn't provide bullet-proof security against attackers like the NSA, there are so many other threats out in the wild. Ordinary ciminals, tracking and advertizing companies, corrupt local government agencies - Tor lets you sleep well in face of those daily risks of the Internet.

Don't forget ISPs, especially those (nearly all?) that sell your data to the highest bidder (often after charging grandiose prices for lousy service).

Unfortunately, the threat of tampering by exit nodes, coupled with the paucity of sites that employ properly implemented, full HTTPS encryption and authentication, and the reality that many sites block access to Tor IPs, all contribute in creating a situation in which the usefulness of Tor is extremely limited.

I would very much know what needs to be changed to make Tor provide bullet-proof security against attackers like the NSA. I do not believe the NSA is more powerful or resourceful than the Chinese government for example. I am sure great minds are working together to help get Tor provide even more privacy and more security. The world needs it.

November 24, 2014

Permalink

Do you know how you can have absolute privacy / anonymity, while reading Wikipedia articles for example? Just download it, and read it offline.

November 24, 2014

In reply to arma

Permalink

(The OP may have their own answer).

As a start, for use of Wikipedia locally on computers look up
WikiTaxi or Kiwix.

WikiTaxi data is updated monthly in English. Text only (10-17 GB) and has some UX limitations. Development seems to have stopped.

Kiwix is available with full graphics in English (40-46 GB) but its last data update was Feb 2014. There still is some developer involvement in SW.

November 25, 2014

In reply to arma

Permalink

OP here: I actually meant downloading in bulk. And Aarddict [1] seems a nice way to browse Wikipedia, Wikimedia Foundation stuff and other information.

[1] http://aarddict.org/

November 25, 2014

In reply to arma

Permalink

OP here: You're right by the way, Tor is essential for finding such information anonymously. But if you know what you want to download, it's best downloaded through clearnet (don't hurt the Tor network too much) because it's in bulk, which is already a great way to protect your privacy / anonymity so no real need for Tor to download.

Also very nice because you can read whenever, and wherever you want because once downloaded, there's no need to be hooked up to the internet at all times.

By the way Outernet [1] is a very nice project that attempts to spread information in bulk this way.

[1] https://www.outernet.is/en/

Sounds good. Though I would worry a little bit if you're the only person in the world downloading wikipedia this way. And I guess I would also worry about the scaling side if millions of people start doing it. But this is a fine discussion topic for someplace other than this blog post. :)

I'm going to kill this thread here, since it's too easy for everybody to use it to advertise their own latest thing, and that's not what this post is for.

November 25, 2014

Permalink

The anonymous multi-user account login on trac.torproject.org doesn't work any more.

November 27, 2014

In reply to arma

Permalink

Looks like someone already changed the login, as it's no longer working.

Can't something be done about this, this is an on-going issue for the past 6 months. Some knuckleheads keep chaining it...eff them!

November 25, 2014

Permalink

you should still make Tor change its network fingerprint specifically packet length distribution to evade end-to-end correlation. This is a must, and I can't stress this enough.

That would be woefully inadequate vs a determined adversary since neither the length nor inter-arrival obfuscation mechanisms seek to defend against end to end correlation type attacks.

To be specific the actual amount of traffic isn't obfuscated all that much (only the tail end of each burst gets padding applied, so the amount of data sent is still exposed to +- ~1500 bytes), and neither is the timing of each burst since the algorithm does not schedule writes when there is no data pending for the most part.

It is also worth noting that the inter-arrival obfuscation in all pluggable transports is currently disabled by default for performance reasons as none of the censors appear to be looking at those kind of statistics.

November 25, 2014

In reply to arma

Permalink

I think a total evasion is not possible, but a solution that decreases chances of correlation to an acceptable low level, or that increases computing power needed to find a match beyond any realistic scenario, at least for passive adversaries.

November 25, 2014

Permalink

Среди разрботчиков tor есть кто-нибудь кто может понять русский язык?

arma

November 25, 2014

In reply to by Anonymous (not verified)

Permalink

I believe the helpdesk does not have a dedicated Russian speaker currently, but it couldn't hurt to try mailing them anyway and see if they can find one for you.

November 25, 2014

Permalink

Yet it's actually a really simple problem to deal with.

The remailer network realized ages ago that dummy traffic and timing shifts were essential.

But for some reason Tor devs just stubbornly refuses to hear this, and instead spend lots of time on tweaking pointless extras, while adding the disclaimer "our product isn't NSA-proof" rather actually fix (or greatly help fix) the problem.

I've heard it said that it's too complex to introduce. But why does it need a complex solution ? Why can't every node add a little random latent time and chuck some dummy traffic out there on the network ? Too much bandwidth ? I think it's worth it, don't you ?

Alas, and as usual, it's more complicated than that.

For starters, the Tor devs actually *were* the designers and developers of the most advanced deployed remailer:
http://freehaven.net/anonbib/#minion-design
http://mixminion.net/

Batching traffic is indeed essential to long-term intersection attack resistance, but "a little random latent time" and "some dummy traffic" probably will have zero real impact on the effectiveness of the attacks.

See
http://freehaven.net/anonbib/#e2e-traffic
for one of our early papers on the topic, and anonbib has many more papers after that on the topic too.

So in short, nobody has a handle on how effective the attacks can be, and therefore nobody has a handle on whether a given amount of batching will help. But it looks pretty clear that a small amount of batching won't help much -- especially since Tor transports *flows*, and mixnets transport *messages*. A lot of the problem in the Tor case is that flows of different sizes look different. And to fix that, as well as making them all start and end at the same time, we're talking an enormous amount of overhead. More research remains.

November 25, 2014

Permalink

The Tor project delivers protection only to those people whose countries can not become global adversaries to them, like China, Iran, Russia, etc, which is its original purpose. If you connect from a Five Eyes country and your exit node is in a Five Eyes country as well, Tor is not secure IMO and you can become a target at any time depending on what you are doing. Intelligence agencies are not only setting up rogue TOR nodes, they are collecting ISP logs of TOR users and it is for sure that they monitor internet routing points between ISP's looking for TOR traffic, just as Sambuddho's paper suggests. Nobody is able to tell for sure how difficult could be to match logs from different giant databases, but goverments could just use passive listening on significative points hopping for positive matches, and I am pretty sure some of searches can be optimised a lot. The TOR project not only fails to address those threats but indeed could have been intended to work with that dual purpose from the begining. Time to throw some montecarlo noise into the TOR network.

A small nit: it seems to me that intelligence agencies don't need to set up their own attacking Tor relays, since they can as easily watch a relay that's set up by an honest person. Same result, less work and risk for them.

And no, the Tor design was not made intentionally weak to correlation attacks ("dual purpose") -- rather, nobody in the world knows how to build a system for traffic flows that vary widely in size and that want to get through within a few minutes, and where bandwidth is scarce.

Ultimately Tor has been successful because it's not too slow. I'd love to make it more secure without hurting the 'not too slow' part too much. Somebody should figure out how to do that.

November 26, 2014

In reply to arma

Permalink

Snowden has shown that if something is possible, then it is already happening:
http://news.slashdot.org/story/14/11/26/0125238/new-snowden-docs-show-g…
Downplaying the risk of global adversaries in heavily infiltrated networks and systems is not going to do good to the TOR project. IMO a more realistic model of possible threats should be build taking into account worse scenarios, like for instance the fact that western intelligence agencies have full access to all traffic at any time, from ISP logs to Internet routing points, satellites, cables, comm. towers and soon our arse too with the Internet of Things. Also If they have computing power and algorithms to monitor, process and store such amount of data, then it is very likely that they have computing power and algorithms to correlate it. Also there should be some kind of forensic hard data to look at every time there is an security breach in the network instead of just saying "hey, trust us!".

November 25, 2014

Permalink

I think you could avoid end-to-end correlation if you chop up content from one site and deliver it over to separate guards to the end user.

So from the top of my head, the tor client requests site content, tags the connection and adds two encrypted return paths for the middle node.

The middle node chops up the content from the exit node, adds checksums and delivers it back over two guards.
At this point the middle node could also randomly appends 5-10% binary chaff to the streams to obfuscate the size.

The tor client listens on the predetermined two guards for the tagged streams and assembles them back to the requested site content.

Check out Conflux:
http://freehaven.net/anonbib/#pets13-splitting
for a very related design.

And for a much older version of the idea, which is oriented towards mix messages rather than traffic flows, see
http://freehaven.net/anonbib/#pet05-serjantov

So those ideas would help (how much?) against traffic volume attacks, but don't forget traffic timing attacks too.

Also, moving from one guard to two guards brings back some of the "users get routed" issues:
https://blog.torproject.org/blog/improving-tors-anonymity-changing-guar…

November 26, 2014

In reply to arma

Permalink

"how much?"

If a user receives 440KB and 630KB over two guards at a time a monitored web site has sent 1MB I think we moved from quite clear to reasonable doubt.

November 25, 2014

Permalink

When you say end to end correlation do you mean looking at the amount of data? Like you would see that I send x kilobytes of data to a Tor entry node and then x kilobytes of data are received at a server from a Tor exit node, so it is then linked to me?

A solution to that would be junk data. Of course the amount of junk data has to be concealed, but the middle node would work for that since all traffic is encrypted. The only way a packet of data at the middle node can be linked to an entry node or exit node is by its size, not its destination or content. If a random amount of junk data (significantly large, to mask the amount of actual data efficiently) is added to a client side request before being sent to the entry node, then that junk data could be dumped (and perhaps replaced by a different amount of junk data) at the middle node (or maybe the entry node). Then the exit node would dump the other junk data before finally sending the request to the server. And when the exit node receives the data from the server to be sent to the client, the exit node adds junk data, which is then discarded by the middle node and replaced with a different amount of junk data, before finally being discarded by the entry node before the real (non-junk) data (such as the web page that was loaded) is given to the client (Tor user).

Junk data would just be random bits of information that mean nothing, just with a header that tells a node to discard it as junk data (and being encrypted, the header saying its junk data would not be visible to an attacker so the amount of junk data is not known).

Sorry if this wasn't entirely clear or worded as well as it could be, I'm not an expert on this stuff. Please at least try to understand the concept I've described and see if it can be implemented into Tor. It could make Tor much more secure.

Thank you everyone who develops and maintains Tor, it is an extremely valuable tool.

Yes -- and Tor even has support for exactly this, in the form of RELAY DROP cells:
https://gitweb.torproject.org/torspec.git/blob/HEAD:/tor-spec.txt#l1230
which the client can send to any of the three nodes in her path.

But now the research question is: how many should she send? And how often? This area is poorly studied, especially when you consider *timing attacks* as well as just volume attacks.

Now, there's a related area that's gotten a lot of attention recently, where timing attacks seem less critical than volume attacks, called website fingerprinting. For background, see
https://blog.torproject.org/blog/critique-website-traffic-fingerprintin…
and then
"A Critical Evaluation of Website Fingerprinting Attacks"
http://freehaven.net/anonbib/#ccs2014-critical

November 26, 2014

In reply to arma

Permalink

Arma, you keep saying such things - undoubtably true - as :
"how many ...? how often...? area poorly studied,..."

Then, since you also said Tor has _already_ virtual support for doing such things, WHY don't you start experimenting in the true world ? Start enabling these features for all Tor users, with a possibly adjustable parameters, decoy cell sizes, added delays etc... and we'll be able to study the effect of the new strategies overall and the proper tuning of parameters to achieve a "best" compromise between efficiency and effectivity.

November 26, 2014

In reply to arma

Permalink

Thank you for the reply. To solve the timing attacks issue I think the same concept could be used. Whereas with volume attacks, the solution is to include random amounts of junk data to change the volume, the solution to timing attacks is to have delays of a random amount of time. Between any two nodes, and also between the user and entry node, there would be a delay of a few seconds. Yes, it would make Tor much slower, it might take 10 times as long to load a web page, but it would be worth it to prevent a potential security hole.

November 25, 2014

Permalink

It will be very helpful if a succinct breakdown of all traffic correlation solution proposals is written and presented for all to easily find on the main website with periodic updates as new proposals are submitted. It should include the most prominent proposals, a simplification of their approach (if possible), reasons for each why parts of them work, reasons for each why parts of them do not work, and anything else you pro's might deem necessary for a researcher to understand. The idea behind this is to expose as many to this information as possible so that even the relative layman with basic understanding of computer networking could be given a brief idea of the issue, proposed ideas, and reasons why they fail to solve the problem without having to sift through dozens of research papers (though each of them could be linked to in the paper). I believe the more minds working on the problem the more likely a solution will arise in this decade. Even the technical novice could find a solution conceptually. Conceptual solutions can then be converted into technical ones by the knowledgeable. Two and a half million users with varying reasons for privacy might (depending on their threat model) be strongly inclined to solve the Achilles heel of the Tor network.

Yes, this is a great idea. For one example, it would fit well with the Oakland "Systemization of Knowledge" series:
http://www.ieee-security.org/TC/SP2014/cfp.html

"Following the success of the previous year’s conferences, we are also soliciting papers focused on systematization of knowledge (SoK). The goal of this call is to encourage work that evaluates, systematizes, and contextualizes existing knowledge. These papers can provide a high value to our community but may not be accepted because of a lack of novel research contributions. Suitable papers include survey papers that provide useful perspectives on major research areas, papers that support or challenge long-held beliefs with compelling evidence, or papers that provide an extensive and realistic evaluation of competing approaches to solving specific problems. Submissions are encouraged to analyze the current research landscape: identify areas that have enjoyed much research attention, point out open areas with unsolved challenges, and present a prioritization that can guide researchers to make progress on solving important challenges."

Now all that remains is for somebody to do it. :)

November 25, 2014

Permalink

Can anyone imagine a way in which a web of trust like system might be used to mitigate malicious relay contamination of the network? Freenet developer xor is working on this currently. Check it out. If any tor dev takes this into serious consideration, please let the community know. It might close a major attack vector. Help him out if it's feasible.

The more general name for this issue is Sybil-resistance.

Tor has a leg up compared to systems like Freenet, because Tor's directory authority design is more centralized:
https://lists.torproject.org/pipermail/tor-talk/2014-November/035772.ht…

In any case, without more hints about how to read about this "Freenet developer xor", it is unlikely to go anywhere here.

You run Tor, to provide cover for those that need it the most. Such as victims of crime, or your neighbor who wants to look up information on an illness without letting it's insurance company first. You should run Tor to provide capacity for humanity to have some sort of freedom in the digital age.

EVERYONE have something to hide. Don't be stupid. Take for example homosexuality in homophobic societies. In Egypt the state is cracking down on homosexuals, and for absolutely no reason. They pose no threat to the political establishment in there.
http://www.independent.co.uk/news/world/africa/grindr-urges-lgbt-commun…
This is where tor comes in handy, here lies ONE answer out of many to your question.

Really? after ALL the snowden revelations about what the NSA, GCHQ, and the rest of the gang are doing? And we still have people asking this question??

November 25, 2014

Permalink

Well, we all do know for certain that they are listening, watching, collecting and analyzing...

Yes, and we keep learning about new ways in which they're watching, where before we thought surely they wouldn't be illegally doing *that* too...

(In fact, the more centralized privacy designs out there -- VPNs, proxies, etc -- are in worse shape than Tor against many of these attacks, since there are far fewer places that the attacker needs to watch in order to be able to launch the attacks.)

This "correlation attack" research area is important, but it's just that: a research area. Nobody knows whether attacks like this work in practice, or how best to defend against them if they do. Tor is strong exactly because of this thriving research community of professors around the world who investigate these questions:
http://freehaven.net/anonbib/

November 26, 2014

In reply to arma

Permalink

Please accept a big thank you for all the work you have done.

I am operating a couple of (non-exit) relays, and want to also contribute exit-nodes. Will there again be a meeting for tor-operators at 31c3 ? How to find ?

Thanks.

There should be a separate release specifically for running relays and bridges (of all kinds) for windows, mac, and Linux. They can revive vidalia for this task, and remove anything that doesn't have anything to do with running bridges and relays, then release it. So they don't start from scratch.
This will make it much easier to run bridges and relays, expanding the network more.
It's such a shame that there isn't a release specifically for that.

My question wasn't so much aimed at the technical side (though helpful), but rather about possible legal implications and best practices when operating an exit relay, in my specific country.

So I hope to be able to check in with people who run exits, to pick their brains.

From the technical side, I found it rather astonishing that some people run relays with very little regard to good operational security. Not to fault them, I think they want to do a good thing and do best of their abilities.

But I mean knowing what capabilities and resources are out there in the hands of our not-so-friendly spy agencies... time to step up operational security.

I cringe every-time I see a high bandwith relay running Windows. Or has an exposed IPMI interface to the world ...

November 26, 2014

In reply to arma

Permalink

I hate how you keep belittling the threats, and every-time a new attack discovered you repeat the same mantras:"we don't really know" "nobody really knows" and "there is no evidence"
Especially after the openssl bug you said there's no evidence it was used. Are you kidding me? that bug's very own nature IS to leave no evidence, so whether it was used or not THERE WOULD BE NO EVIDENCE!!!! Not mentioning how you start using diplomatic language, and avoid admitting defeat at the hands of the NSA.
"it's just that: a research area" no it's not just a research area. these threats are real. The NSA have tapped whole countries. And you still think this is just a research area.
"Nobody knows whether attacks like this work in practice" well probably not you, but the NSA and company know very well it works, that's why they bugged the whole fucking planet.

And you always run away from the question by attacking proxies and vpns, who mentioned those?

Many threats have been known for a long time, like end-to-end correlation, but you still haven't adopted even half a solution (which is scramblesuit between clients and guard relays). why? because you're afraid it's going to "slow down" the network. well, guess what! surprise surprise, tor users do NOT use tor for speed!!!!! and scramblesuit is NOT as slow or memory consuming as you think it is. scramblesuit is our savior and you still haven;t accepted it.

November 26, 2014

Permalink

How about that while using tor all the time to send and receive packets of traffic such all the time as play video YouTube.
This can help against traffic correlation attacks?
I do not know English hope my question is clear.
Thanks in advance

November 26, 2014

Permalink

Excuse a beginner at this, and also for not using my own language. But is there any difference when using a encrypted ip adress, before you enter Tor? If so, should one do it afterwords?
Started to think about this when someone wrote that NSA can get you anyway.

Mr Walking stick

What is an "encrypted IP address"? It sounds like you are listening to some for-profit snake-oil company that is trying to mislead you about how the Internet works. :(

November 27, 2014

Permalink

Okay, Tor isn't perfect; we all knew that. But, does that mean that you, the Tor developers, should just give-up? Of course, not!! Make life as difficult as you can for TLAs/LE; after all, all that we are talking about, ultimately, is "bits and bytes" here. Western civilization has, in spite of strong encryption and anonymity software being widely available, continued on just fine. I say that the benefits outweigh the costs; after all, we (at least, we Americans) do not throw the 2nd Amendment under the bus just because some lunatic abuses it. Ditto for the 1st Amendment.

So, where does that leave us, for now, at least? How about using Tor bridges, or better yet, obfuscated bridges? Easy to do with the new Tor Browser software! Also, why not access the Tor network via an anonymous Wi-Fi hotspot, perhaps, using Tails? In any case, "Don't give up!" Continue to develop your software to its fullest and brightest potential! It's not only your right, but at this point, your duty, as well. If not you, then who?

Thanks for the kind words! And don't worry, we aren't going to give up. Tor is as much a movement and community as it is a particular set of software.

As for your particular suggestions: using bridges or obfuscated bridges might help if your adversary doesn't realize that he should log traffic flows to/from those bridges. For more discussion there, see the threads around
https://blog.torproject.org/blog/being-targeted-nsa#comment-64375

Doing something in front of Tor, like switching to a wifi hotspot, could be a good idea -- but take care that your opsec approaches like this don't accidentally add in some new vulnerability.

December 02, 2014

In reply to arma

Permalink

Your helpful suggestion of WiFi hotspot ....

Is that only because the traffic between the ISP and Tor guard relay will show the hotspot's IP rather than your own?

Or requires multiple users of the hotspot overlapping their destination like Facebook, Twitter, Google etc, rather than connecting to unique sites (like blog.torproject.org)?

Or if you can cite a link that explores these issues to save yourself precious time, thank you.

November 28, 2014

Permalink

Never Forget Iranian Cyber Power !
They even hacked twitter A few years ago
also They were able to decode Tor In the same year

Iranian cyber police claims that is capable of detecting at least 60% of cyber crime.while this figure is 20 to 30 percent in developed countries..so i think there are somethings more than IP .It can be hardware information such as :mac address , ISP logs to Internet routing points, satellites, cables and etc.

i believe to ISP logs to Internet routing points ! and i think
Each user has its own algorithm.

A) Decoding Tor, i.e. recognizing that a flow on the Internet is Tor, is not the same as breaking Tor's anonymity. So yes, periodically Iran figures out how to recognize and block Tor flows, and then we fix that (and that's the arms race that "pluggable transports" aims to win), but none of the moves by Iran have involved breaking the anonymity that Tor provides (learning which websites a given user visits, and learning which users visit a given website).

B) Their numbers (like 20-30% of cyber crimes detected in developed countries) sound like nonsense to me. So I would assume that they're saying these things to change your behavior, not because they're factually correct.

December 01, 2014

Permalink

As I thought they can only do something bad on the exit nodes~
Tor is still the best of the best~

December 02, 2014

Permalink

Brand New To Tor,But Really Like it.Anything that can be done to keep the tail from wagging the dog is GREAT news to me.Here is my Question? Can you mirror a node,if so then put in may mirrors with each node able to add a little salt or take some away.Also this could add small random amounts of time. I do wait a little time for most web pages to load anyway.Example. Node one has 10 mirrors, node two has 15 mirrors, node three has 7, all the data is broken down and not only sent to one,two,and three but actually all of the mirrors as well..Like I said I am New to tor. Thanks 2 everyone at Tor.

December 04, 2014

Permalink

Как начать пользоваться системой TOR? И где здесь регистрачия?

December 05, 2014

Permalink

Tor needs more people to run relays
for stronger privacy
don't just post about tor
help tor anyway you can.