How to handle millions of new Tor clients

[tl;dr: if you want your Tor to be more stable, upgrade to a Tor Browser Bundle with Tor 0.2.4.x in it, and then wait for enough relays to upgrade to today's 0.2.4.17-rc release.]

Starting around August 20, we started to see a sudden spike in the number of Tor clients. By now it's unmistakable: there are millions of new Tor clients and the numbers continue to rise:

Tor users in summer 2013

Where do these new users come from? My current best answer is a botnet.

Some people have speculated that the growth in users comes from activists in Syria, Russia, the United States, or some other country that has good reason to have activists and journalists adopting Tor en masse lately. Others have speculated that it's due to massive adoption of the Pirate Browser (a Tor Browser Bundle fork that discards most of Tor's security and privacy features), but we've talked to the Pirate Browser people and the downloads they've seen can't account for this growth. The fact is, with a growth curve like this one, there's basically no way that there's a new human behind each of these new Tor clients. These Tor clients got bundled into some new software which got installed onto millions of computers pretty much overnight. Since no large software or operating system vendors have come forward to tell us they just bundled Tor with all their users, that leaves me with one conclusion: somebody out there infected millions of computers and as part of their plan they installed Tor clients on them.

It doesn't look like the new clients are using the Tor network to send traffic to external destinations (like websites). Early indications are that they're accessing hidden services — fast relays see "Received an ESTABLISH_RENDEZVOUS request" many times a second in their info-level logs, but fast exit relays don't report a significant growth in exit traffic. One plausible explanation (assuming it is indeed a botnet) is that it's running its Command and Control (C&C) point as a hidden service.

My first observation is "holy cow, the network is still working." I guess all that work we've been doing on scalability was a good idea. The second observation is that these new clients actually aren't adding that much traffic to the network. Most of the pain we're seeing is from all the new circuits they're making — Tor clients build circuits preemptively, and millions of Tor clients means millions of circuits. Each circuit requires the relays to do expensive public key operations, and many of our relays are now maxed out on CPU load.

There's a possible dangerous cycle here: when a client tries to build a circuit but it fails, it tries again. So if relays are so overwhelmed that they each drop half the requests they get, then more than half the attempted circuits will fail (since all the relays on the circuit have to succeed), generating even more circuit requests.

So, how do we survive in the face of millions of new clients?

Step one was to see if there was some simple way to distinguish them from other clients, like checking if they're using an old version of Tor, and have entry nodes refuse connections from them. Alas, it looks like they're running 0.2.3.x, which is the current recommended stable.

Step two is to get more users using the NTor circuit-level handshake, which is new in Tor 0.2.4 and offers stronger security with lower processing overhead (and thus less pain to relays). Tor 0.2.4.17-rc comes with an added twist: we prioritize NTor create cells over the old TAP create cells that 0.2.3 clients send, which a) means relays will get the cheap computations out of the way first so they're more likely to succeed, and b) means that Tor 0.2.4 users will jump the queue ahead of the botnet requests. The Tor 0.2.4.17-rc release also comes with some new log messages to help relay operators track how many of each handshake type they're handling.

(There's some tricky calculus to be done here around whether the botnet operator will upgrade his bots in response. Nobody knows for sure. But hopefully not for a while, and in any case the new handshake is a lot cheaper so it would still be a win.)

Step three is to temporarily disable some of the client-side performance features that build extra circuits. In particular, our circuit build timeout feature estimates network performance for each user individually, so we can tune which circuits we use and which we discard. First, in a world where successful circuits are rare, discarding some — even the slow ones — might be unwise. Second, to arrive at a good estimate faster, clients make a series of throwaway measurement circuits. And if the network is ever flaky enough, clients discard that estimate and go back and measure it again. These are all fine approaches in a network where most relays can handle traffic well; but they can contribute to the above vicious cycle in an overloaded network. The next step is to slow down these exploratory circuits in order to reduce the load on the network. (We would temporarily disable the circuit build timeout feature entirely, but it turns out we had a bug where things get worse in that case.)

Step four is longer-term: there remain some NTor handshake performance improvements that will make them faster still. It would be nice to get circuit handshakes on the relay side to be really cheap; but it's an open research question how close we can get to that goal while still providing strong handshake security.

Of course, the above steps aim only to get our head back above water for this particular incident. For the future we'll need to explore further options. For example, we could rate-limit circuit create requests at entry guards. Or we could learn to recognize the circuit building signature of a bot client (maybe it triggers a new hidden service rendezvous every n minutes) and refuse or tarpit connections from them. Maybe entry guards should demand that clients solve captchas before they can build more than a threshold of circuits. Maybe we rate limit TAP handshakes at the relays, so we leave more CPU available for other crypto operations like TLS and AES. Or maybe we should immediately refuse all TAP cells, effectively shutting 0.2.3 clients out of the network.

In parallel, it would be great if botnet researchers would identify the particular characteristics of the botnet and start looking at ways to shut it down (or at least get it off of Tor). Note that getting rid of the C&C point may not really help, since it's the rendezvous attempts from the bots that are hurting so much.

And finally, I still maintain that if you have a multi-million node botnet, it's silly to try to hide it behind the 4000-relay Tor network. These people should be using their botnet as a peer-to-peer anonymity system for itself. So I interpret this incident as continued exploration by botnet developers to try to figure out what resources, services, and topologies integrate well for protecting botnet communications. Another facet of solving this problem long-term is helping them to understand that Tor isn't a great answer for their problem.

Anonymous

September 05, 2013

In reply to by Anonymous (not verified)

Permalink

With above in mind, shouldn't the AV companies be able to add detection/fix to the db updates?
If they do, it will be interesting to follow the TOR stats for a few days.

Don't you think the malware used to create this botnet has already been added to signaturebased malware detection of AV's? I mean, this botnet exists since 2009. The reason for currently more than a million zombie computers from this botnet on the TOR network is either unsufficiently / not at all protected machines, or the usage of advanced Trojans that use techniques to circumvent AV technology. In both cases, putting your hope on AV companies is not going to solve the problem for the TOR network. AV helps the ones running it properly, but it ain't fix the problems for a network that is used by a massive amount of badly protected zombies. I think Tor can better protect itself against these threats. I like the good old human-test, as part of a defense system. Kudos for the guys at Fox-IT for sharing their knowledge. If you know what is threatening your network, you sure can better protect it. Generally.

I downloaded the sample and scanned it utilizing VirusTotal, VirSCAN and Metascan. I was able to find that more than 20 of the included AntiVirus softwares did not detect the file as malware. I then used a malware submission index (list of submit pages and/or emails) and sent a copy of the file to each of those vendors. Hopefully that helps somewhat.

Anonymous

September 05, 2013

Permalink

Someone should convince the botnet owner to make all the bots run as Tor relays/exit nodes!

Anonymous

September 05, 2013

Permalink

maybe one could integrated somthing like a "is it a human?" check at the begin of session. Similar to captcha for comments...
Having done this one could prefer users who use the new version with integrated check and have a positive check.

Not that user friendly but helpful until there is another solution...

Anonymous

September 05, 2013

Permalink

How about the botnet owner makes all nodes relays, contributing to the network in addition to using it?

Random idea for long-term: before the handshake, a server could give the client a task to solve, expensive to compute by the client and cheap to verify by the server (e.g. hash). Once solved, they can proceed with handshake. That task can be proportionally complex depending on server load, so a server could manage the load it receives from clients asking for circuits.

Some people own last century computers. This way they will not be able to use Tor network. I think it will not solve the problem.

You are right about the botnet owner. More relays are so welcome!

Anonymous

September 05, 2013

Permalink

Would the mutli-million node botnet be less of a problem if say 10% of them became relays? Ignoring the ethical dilemma and public relations disaster of accepting hijacked resources into the Tor network of course...

Anonymous

September 05, 2013

Permalink

ps If botnets run relays, you can add a new class of user to the "Who uses Tor" page, not to mention a new class of legal problems to solve...

Anonymous

September 05, 2013

Permalink

Can I Download 0.2.4.17-rc release later? Or must run Tor as a relay to download?

Anonymous

September 05, 2013

Permalink

For the convenience of us "advanced" users who still do not compile their own, would you please include a win32-binary 'tor.exe' along with its checksum, among the downloadables ?

I mean, can't find it yet under :www.torproject.org/dist/win32/

:=(

Cheers and Thanks

--
Noino

Hey Noino,

There's usually around a 24h lag between package creation and release. If you want to have earlier access to packages, you can sign up to the tor-qa mailing list and get them about a day before they are released. I send them there first and generally give our testers 24h to let me know if they find any problems. The list is here:

https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-qa

That said, the 0.2.4.17-rc packages are now available on the website.

Anonymous

September 05, 2013

Permalink

Please take a look in litecoin.
Seems that a huge botnet is mining litecoin.

forum.litecoin.net/index.php/topic,5693.0.html

Two things:

One, take a look at this graph - if this mining "pool" (not really) is associated with the Tor client jump, then it's likely the periodicity of this graph provides useful data with which to infer the deeper causative relationship:

https://forum.litecoin.net/index.php/topic,5693.msg44523.html#msg44523

-

Two, if this is a mining botnet custom-coded for just that, then "c&c" becomes a much different issue. Basically it involves passing hash values back to be validated - or if they are locally validated on the individual zombie machines, then only passing "up" to the network controller the successful hash results. Which would not make much network traffic - but could result in alot of parsimonious sessions as hashes (small bits) are checked against a central resource and a result (very small bits) is passed back to the zombie machine.

In any case, folks with more firsthand in admin of specific mining toolsets likely can clarify the precise mechanics. What's clear is that it would exhibit qualitatively different characteristics than the usual botnet: it's not pushing DDoS packets, for example, so there's no flood of traffic being generated - ever.

-

Our $0,02 is that this isn't .gov - it doesn't match the fingerprint, nor any likely-scenario disinfo fingerprints. The question to ask is what sort of botnet activity would find substantial benefits to having some comms natively routed through Tor hidden services. Obviously, DDoS doesn't match that profile... but it seems probable that something does. Narrow down, via logical interpolation (which is to say, assume botnet operators are smart - because they are) to the sorts of activities that would be well-served by inclusion of Tor in their topological model, and we're likely closer to finding the needle in a much smaller haystack.

What benefits from a bunch of nodes, talking via Tor hidden services but not saying much? What kinds of comms would be particularly well-protected against traffic analysis attacks by burying them within the Tor network itself? What techniques do botnet researchers use to uncover botnet admin infrastructure that would be less viable if the botnet itself was masked within Tor hidden services?

Smart people do things for good reasons - we may not yet know those reasons, but we know they exist. Someone smart made this design decision and has implemented it in a real-world botnet; surely, she had good reasons for doing so. Understand the reasons, and we understand eventually the activity to which she puts her botnet. Q.E.D.

-

~ pj | http://cryptostorm.is

==-
It should be obvious. The botnet operator is hiding his command/ctl server behind tor so it can't be found and followed to him personally.

-faye kane ♀ girl brain

Is it so obvious? If you control three million computers owned by random people throughout the world, why not use them to build your own anonymity network and hide your server in there, instead of hiding in a relatively tiny network of 4000 volunteers?

Anonymous

September 05, 2013

Permalink

The "hashcash" (Google it) is pointless because a botnet client has the same resources like a normal user.

Anonymous

September 05, 2013

Permalink

The real answer is to remove hidden services from the default tor software. Hidden services can be delivered through an add-on to the tor network.

Anonymous

September 05, 2013

Permalink

As I've said in a lot of places, you need to look beyond the overall users graph and look at the geographic distribution as described by, for example:

https://metrics.torproject.org/users.html?graph=userstats-relay-country…

The overall pattern does not look like a botnet to me. It's way too even and consistent over a large range of countries including ones with tiny numbers of computers, while omitting some very strange choices. (China, Israel for example)

China isn't showing bots because the Tor clients there, without pluggable transports, can't reach the Tor network.

So, the botnet's (assuming it is a botnet of course) nodes in China are being censored. Same with Iran.

But Israel?

And again, it seems way too even otherwise. Proportionate to internet users, you'd expect a huge signal from Russia, and not much from the UK and Vatican city, say, but that's not the case.

Perhaps it was a Tor research project. But I'm sure that someone would have already mentioned that, even if nothing has yet been published. Also, 40K clients would be nontrivial for a research project, even using VMs. And conversely, infecting and cleaning 40K random machines would also be nontrivial.

This spike in Israeli Tor users only shows up in the beta estimates. Perhaps it's just an artifact in the beta estimates.

40K is indeed small relative to many millions. Even so, creating 40K clients during 2013-02-15 through 2013-02-19, and removing them again during 2013-02-23 through 2013-02-25, was an impressive feat.

Perhaps it was a pilot test, or another much smaller botnet.

Anonymous

September 05, 2013

Permalink

Problem with some of these defenses is that they would deny legitimate research on the live net. Or even something as simple as polling onions for uptime.

Anonymous

September 05, 2013

Permalink

FBI-Skynet-
The raid on FreedomHost -Aug 5- when they left a weaponized exploit to Firefox users running Windows systems - maybe the code has morph into an Tor-net with a little help - Timing is everything - my 2 cents

Seems unlikely that it's related, especially since it looks like this botnet has been around in some form since 2009.

That said, I admit that these days every time I call something a crazy conspiracy theory, it's increasingly turning out to be true. :)

Anonymous

September 05, 2013

Permalink

If the botnet admin decides to push a button to make his bot to start tx/rx data, can he take the whole Tor network down?