Quick Summary of recent traffic correlation using netflows

Here’s what you need to know about the recent research study on traffic correlation attacks:

While it’s great to see more research on traffic correlation attacks, this is not a new area of research. This is one study on the subject in a controlled environment using one readily available traffic monitoring technology to analyze Tor traffic. The researcher has clarified in the media that it was only 81.4 percent of their experiments not “81 percent of all Tor traffic” as has been reported elsewhere.

The Tor network provides anonymity by routing the user’s information through multiple servers (usually three) so that it is hard to detect the person’s physical location.

Tor protects users by:
1) encryption to ensure privacy of data within the Tor network,
2) authentication so clients know they're talking to the relays they meant to talk to, and
3) signatures to make sure all clients know the same set of relays.

In theory it may be possible to track Tor users by linking up their entry and exit points on the network but it is generally very difficult to do so. The Tor network design, however, does not protect against a targeted attack by a global passive adversary (such as the NSA) intent on figuring out whom to investigate through watching and measuring Tor traffic going into and out of the network and correlating the information on both sides. We encourage you to learn more about what Tor does provide.

Tor is used by 2.5 million people a day including the general public, journalists, companies, activists, military, and law enforcement and is a very safe, reliable way to protect your privacy on the Internet.


November 24, 2014


Thanks for that clarification!

People really need to understand the threat model Tor has in mind. Even if Tor doesn't provide bullet-proof security against attackers like the NSA, there are so many other threats out in the wild. Ordinary ciminals, tracking and advertizing companies, corrupt local government agencies - Tor lets you sleep well in face of those daily risks of the Internet.


November 26, 2014

In reply to by Anonymous (not verified)


Don't forget ISPs, especially those (nearly all?) that sell your data to the highest bidder (often after charging grandiose prices for lousy service).

Unfortunately, the threat of tampering by exit nodes, coupled with the paucity of sites that employ properly implemented, full HTTPS encryption and authentication, and the reality that many sites block access to Tor IPs, all contribute in creating a situation in which the usefulness of Tor is extremely limited.


December 01, 2014

In reply to by Anonymous (not verified)


I would very much know what needs to be changed to make Tor provide bullet-proof security against attackers like the NSA. I do not believe the NSA is more powerful or resourceful than the Chinese government for example. I am sure great minds are working together to help get Tor provide even more privacy and more security. The world needs it.


November 24, 2014


Do you know how you can have absolute privacy / anonymity, while reading Wikipedia articles for example? Just download it, and read it offline.

(The OP may have their own answer).

As a start, for use of Wikipedia locally on computers look up
WikiTaxi or Kiwix.

WikiTaxi data is updated monthly in English. Text only (10-17 GB) and has some UX limitations. Development seems to have stopped.

Kiwix is available with full graphics in English (40-46 GB) but its last data update was Feb 2014. There still is some developer involvement in SW.

OP here: You're right by the way, Tor is essential for finding such information anonymously. But if you know what you want to download, it's best downloaded through clearnet (don't hurt the Tor network too much) because it's in bulk, which is already a great way to protect your privacy / anonymity so no real need for Tor to download.

Also very nice because you can read whenever, and wherever you want because once downloaded, there's no need to be hooked up to the internet at all times.

By the way Outernet [1] is a very nice project that attempts to spread information in bulk this way.

[1] https://www.outernet.is/en/

Sounds good. Though I would worry a little bit if you're the only person in the world downloading wikipedia this way. And I guess I would also worry about the scaling side if millions of people start doing it. But this is a fine discussion topic for someplace other than this blog post. :)

Please recommend some *.onion websites.

I'm going to kill this thread here, since it's too easy for everybody to use it to advertise their own latest thing, and that's not what this post is for.

The anonymous multi-user account login on trac.torproject.org doesn't work any more.

There, I've fixed it for now.


Such is life when we try to maintain a communal account.

Looks like someone already changed the login, as it's no longer working.

Can't something be done about this, this is an on-going issue for the past 6 months. Some knuckleheads keep chaining it...eff them!

you should still make Tor change its network fingerprint specifically packet length distribution to evade end-to-end correlation. This is a must, and I can't stress this enough.

Do you have a concrete design in mind?

I ask because all of the designs proposed so far don't really evade end-to-end correlation. So you end up with higher overhead but not necessarily any better security.

For more details see the various papers and blog posts cited in

Something similar to scramblesuit?

That would be woefully inadequate vs a determined adversary since neither the length nor inter-arrival obfuscation mechanisms seek to defend against end to end correlation type attacks.

To be specific the actual amount of traffic isn't obfuscated all that much (only the tail end of each burst gets padding applied, so the amount of data sent is still exposed to +- ~1500 bytes), and neither is the timing of each burst since the algorithm does not schedule writes when there is no data pending for the most part.

It is also worth noting that the inter-arrival obfuscation in all pluggable transports is currently disabled by default for performance reasons as none of the censors appear to be looking at those kind of statistics.

I think a total evasion is not possible, but a solution that decreases chances of correlation to an acceptable low level, or that increases computing power needed to find a match beyond any realistic scenario, at least for passive adversaries.

Среди разрботчиков tor есть кто-нибудь кто может понять русский язык?

I believe the helpdesk does not have a dedicated Russian speaker currently, but it couldn't hurt to try mailing them anyway and see if they can find one for you.

"Question: If TOR were to be designed again, from the start, what would be different?"

So far it looks like a bunch of people trying to recreate the results of the papers on

I encourage you all to go check those out.


Wait, that's not HTTPS ?! WHAT??!!!

Ha, too funny!

What part of "Limitations of End-to-End Encryption" didn't you get? :)

Yet it's actually a really simple problem to deal with.

The remailer network realized ages ago that dummy traffic and timing shifts were essential.

But for some reason Tor devs just stubbornly refuses to hear this, and instead spend lots of time on tweaking pointless extras, while adding the disclaimer "our product isn't NSA-proof" rather actually fix (or greatly help fix) the problem.

I've heard it said that it's too complex to introduce. But why does it need a complex solution ? Why can't every node add a little random latent time and chuck some dummy traffic out there on the network ? Too much bandwidth ? I think it's worth it, don't you ?

Alas, and as usual, it's more complicated than that.

For starters, the Tor devs actually *were* the designers and developers of the most advanced deployed remailer:

Batching traffic is indeed essential to long-term intersection attack resistance, but "a little random latent time" and "some dummy traffic" probably will have zero real impact on the effectiveness of the attacks.

for one of our early papers on the topic, and anonbib has many more papers after that on the topic too.

So in short, nobody has a handle on how effective the attacks can be, and therefore nobody has a handle on whether a given amount of batching will help. But it looks pretty clear that a small amount of batching won't help much -- especially since Tor transports *flows*, and mixnets transport *messages*. A lot of the problem in the Tor case is that flows of different sizes look different. And to fix that, as well as making them all start and end at the same time, we're talking an enormous amount of overhead. More research remains.

The Tor project delivers protection only to those people whose countries can not become global adversaries to them, like China, Iran, Russia, etc, which is its original purpose. If you connect from a Five Eyes country and your exit node is in a Five Eyes country as well, Tor is not secure IMO and you can become a target at any time depending on what you are doing. Intelligence agencies are not only setting up rogue TOR nodes, they are collecting ISP logs of TOR users and it is for sure that they monitor internet routing points between ISP's looking for TOR traffic, just as Sambuddho's paper suggests. Nobody is able to tell for sure how difficult could be to match logs from different giant databases, but goverments could just use passive listening on significative points hopping for positive matches, and I am pretty sure some of searches can be optimised a lot. The TOR project not only fails to address those threats but indeed could have been intended to work with that dual purpose from the begining. Time to throw some montecarlo noise into the TOR network.

A small nit: it seems to me that intelligence agencies don't need to set up their own attacking Tor relays, since they can as easily watch a relay that's set up by an honest person. Same result, less work and risk for them.

And no, the Tor design was not made intentionally weak to correlation attacks ("dual purpose") -- rather, nobody in the world knows how to build a system for traffic flows that vary widely in size and that want to get through within a few minutes, and where bandwidth is scarce.

Ultimately Tor has been successful because it's not too slow. I'd love to make it more secure without hurting the 'not too slow' part too much. Somebody should figure out how to do that.

Snowden has shown that if something is possible, then it is already happening:
Downplaying the risk of global adversaries in heavily infiltrated networks and systems is not going to do good to the TOR project. IMO a more realistic model of possible threats should be build taking into account worse scenarios, like for instance the fact that western intelligence agencies have full access to all traffic at any time, from ISP logs to Internet routing points, satellites, cables, comm. towers and soon our arse too with the Internet of Things. Also If they have computing power and algorithms to monitor, process and store such amount of data, then it is very likely that they have computing power and algorithms to correlate it. Also there should be some kind of forensic hard data to look at every time there is an security breach in the network instead of just saying "hey, trust us!".

"Investigators said they began to suspect the couple after discovering an Internet Protocol address was accessing the Silk Road 2.0 site"
How did they discover the ip?

I think you could avoid end-to-end correlation if you chop up content from one site and deliver it over to separate guards to the end user.

So from the top of my head, the tor client requests site content, tags the connection and adds two encrypted return paths for the middle node.

The middle node chops up the content from the exit node, adds checksums and delivers it back over two guards.
At this point the middle node could also randomly appends 5-10% binary chaff to the streams to obfuscate the size.

The tor client listens on the predetermined two guards for the tagged streams and assembles them back to the requested site content.

Check out Conflux:
for a very related design.

And for a much older version of the idea, which is oriented towards mix messages rather than traffic flows, see

So those ideas would help (how much?) against traffic volume attacks, but don't forget traffic timing attacks too.

Also, moving from one guard to two guards brings back some of the "users get routed" issues:

"how much?"

If a user receives 440KB and 630KB over two guards at a time a monitored web site has sent 1MB I think we moved from quite clear to reasonable doubt.

When you say end to end correlation do you mean looking at the amount of data? Like you would see that I send x kilobytes of data to a Tor entry node and then x kilobytes of data are received at a server from a Tor exit node, so it is then linked to me?

A solution to that would be junk data. Of course the amount of junk data has to be concealed, but the middle node would work for that since all traffic is encrypted. The only way a packet of data at the middle node can be linked to an entry node or exit node is by its size, not its destination or content. If a random amount of junk data (significantly large, to mask the amount of actual data efficiently) is added to a client side request before being sent to the entry node, then that junk data could be dumped (and perhaps replaced by a different amount of junk data) at the middle node (or maybe the entry node). Then the exit node would dump the other junk data before finally sending the request to the server. And when the exit node receives the data from the server to be sent to the client, the exit node adds junk data, which is then discarded by the middle node and replaced with a different amount of junk data, before finally being discarded by the entry node before the real (non-junk) data (such as the web page that was loaded) is given to the client (Tor user).

Junk data would just be random bits of information that mean nothing, just with a header that tells a node to discard it as junk data (and being encrypted, the header saying its junk data would not be visible to an attacker so the amount of junk data is not known).

Sorry if this wasn't entirely clear or worded as well as it could be, I'm not an expert on this stuff. Please at least try to understand the concept I've described and see if it can be implemented into Tor. It could make Tor much more secure.

Thank you everyone who develops and maintains Tor, it is an extremely valuable tool.

Yes -- and Tor even has support for exactly this, in the form of RELAY DROP cells:
which the client can send to any of the three nodes in her path.

But now the research question is: how many should she send? And how often? This area is poorly studied, especially when you consider *timing attacks* as well as just volume attacks.

Now, there's a related area that's gotten a lot of attention recently, where timing attacks seem less critical than volume attacks, called website fingerprinting. For background, see
and then
"A Critical Evaluation of Website Fingerprinting Attacks"

Arma, you keep saying such things - undoubtably true - as :
"how many ...? how often...? area poorly studied,..."

Then, since you also said Tor has _already_ virtual support for doing such things, WHY don't you start experimenting in the true world ? Start enabling these features for all Tor users, with a possibly adjustable parameters, decoy cell sizes, added delays etc... and we'll be able to study the effect of the new strategies overall and the proper tuning of parameters to achieve a "best" compromise between efficiency and effectivity.

Thank you for the reply. To solve the timing attacks issue I think the same concept could be used. Whereas with volume attacks, the solution is to include random amounts of junk data to change the volume, the solution to timing attacks is to have delays of a random amount of time. Between any two nodes, and also between the user and entry node, there would be a delay of a few seconds. Yes, it would make Tor much slower, it might take 10 times as long to load a web page, but it would be worth it to prevent a potential security hole.

It will be very helpful if a succinct breakdown of all traffic correlation solution proposals is written and presented for all to easily find on the main website with periodic updates as new proposals are submitted. It should include the most prominent proposals, a simplification of their approach (if possible), reasons for each why parts of them work, reasons for each why parts of them do not work, and anything else you pro's might deem necessary for a researcher to understand. The idea behind this is to expose as many to this information as possible so that even the relative layman with basic understanding of computer networking could be given a brief idea of the issue, proposed ideas, and reasons why they fail to solve the problem without having to sift through dozens of research papers (though each of them could be linked to in the paper). I believe the more minds working on the problem the more likely a solution will arise in this decade. Even the technical novice could find a solution conceptually. Conceptual solutions can then be converted into technical ones by the knowledgeable. Two and a half million users with varying reasons for privacy might (depending on their threat model) be strongly inclined to solve the Achilles heel of the Tor network.

Yes, this is a great idea. For one example, it would fit well with the Oakland "Systemization of Knowledge" series:

"Following the success of the previous year’s conferences, we are also soliciting papers focused on systematization of knowledge (SoK). The goal of this call is to encourage work that evaluates, systematizes, and contextualizes existing knowledge. These papers can provide a high value to our community but may not be accepted because of a lack of novel research contributions. Suitable papers include survey papers that provide useful perspectives on major research areas, papers that support or challenge long-held beliefs with compelling evidence, or papers that provide an extensive and realistic evaluation of competing approaches to solving specific problems. Submissions are encouraged to analyze the current research landscape: identify areas that have enjoyed much research attention, point out open areas with unsolved challenges, and present a prioritization that can guide researchers to make progress on solving important challenges."

Now all that remains is for somebody to do it. :)

Can anyone imagine a way in which a web of trust like system might be used to mitigate malicious relay contamination of the network? Freenet developer xor is working on this currently. Check it out. If any tor dev takes this into serious consideration, please let the community know. It might close a major attack vector. Help him out if it's feasible.

The more general name for this issue is Sybil-resistance.

Tor has a leg up compared to systems like Freenet, because Tor's directory authority design is more centralized:

In any case, without more hints about how to read about this "Freenet developer xor", it is unlikely to go anywhere here.

Why dont you add another layer of tls over tor and make it look like tls?

Tor already speaks TLS for its link encryption. So it looks like TLS because it *is* TLS.

But it's TLS as spoken by OpenSSL, which is subtly different from TLS as spoken by (say) LibNSS.

For much more on this topic, see our 28c3 talk "How governments have tried to block Tor":

As a normal ordinary person with nothing to hide, then why should I use TOR?

For an introduction to the topic, you might like my video from Internet Days in Sweden a few years back -- it's at point 'h' on
or you can download it directly at