10 years of collecting Tor directory data

by karsten | May 15, 2014

Today is the 10th anniversary of collecting Tor directory data!

As the 2004 Tor design paper says, "As of mid-May 2004, the Tor network consists of 32 nodes (24 in the US, 8 in Europe), and more are joining each week as the code matures."

In fact, we still have the original relay lists from back then. The first archived Tor directory dates back to May 15, 2004. It starts with the following lines which are almost human-readable:

signed-directory
published 2004-05-15 07:30:57
recommended-software 0.0.6.1,0.0.7pre1-cvs
running-routers moria1 moria2 tor26 incognito jap dizum
cassandra metacolo poblano ned TheoryOrg Tonga
peertech hopey tequila triphop moria4 anize rot52
randomtrash

As of today, May 15, 2014, there are about 4,600 relays in the Tor network and another 3,300 bridges. In these 10 years, we have collected a total of 212 GiB of bz2-compressed tarballs containing Tor directory data. That's more than 600 GiB of uncompressed data. And of course, the full archive is publicly available for download.

Here's a small selection of what people do with this fine archive:

If people want to use the Tor directory archive for their research or for building new applications, or want to help out with the projects listed above, don't hesitate to contact us!

Happy 10th birthday, Tor directory archive!

Comments

Please note that the comment area below has been archived.

May 16, 2014

Permalink

Yes, it does indeed call for a joyous celebration....but...wait...

Of the Tor directory data, how many bridges, nodes and relays are run by the NSA, FBI, CIA, GCHQ (UK) and the other 3 "eyes"? Or is such information top secret?

May 16, 2014

In reply to karsten

Permalink

17 relays and 5 bridges hosted by the NSA? When was that? What about this year?

Has Tor developers done anything to prevent NSA, FBI, CIA and GCHQ from running rogue relays and bridges?

Karsten's response was a joke (but a funny one). He responded to the "which relays are run by nsa" by showing you which relays have the substring "nsa" in their nickname. It is funny on the surface because of course anybody could put, or not put, that substring in their nickname. And funny underneath because it underlines how hard it is to actually answer a question like that for sure.

But to give more concrete answers, as far as we know there have never been any relays or bridges run by the NSA. But that shouldn't make you happy, because the attack I worry about is having their surveillance cover somebody else's perfectly honest relay. Why should they run their own relays when they can watch yours, and get almost all of the benefits with fewer risks? But here I am repeating all of my statements from the gchq-quick-ant blog post, so I'm going to stop doing that here and invite you again to go read all of them there.

(Oh, and there *were* some relays run on amazon ec2 by gchq. Most of them were tiny and only for a week. Go check out the remation 2 documents.)

May 16, 2014

Permalink

SO there you go
I mean these guys are getting paid so you do not something for nothing these days

tim

As I understand it, the US, Canada, Europe, Australia and NZ are trying their best to change and upgrade their economies from manufacturing to IT-based.

Let's face it: most manufacturing jobs are outsourced to China, Vietnam, Bangladesh, Brazil, etc.... which provide jobs to millions of people.

Those manufacturing jobs that are still in the US, Europe, Australia, etc.. are few and far in between. Take for example, how many jobs can manufacturers of aircraft, military drones, satellites, etc...offer to job seekers?

The US has since realized that the only way to create more jobs for its citizens is to heighten the sense of insecurity among the world's population leading to the need for more online surveillance.

The US plans to recruit more than 6,000 people next year. The excuse is cyber security.

May 16, 2014

Permalink

Could you put a graph on Tor Metrics Portal for number of Pluggable Transport users out of all bridge users and the number of users for each indivdual PT ( obfs3, fte and flash proxy )?

There are graphs on the number of bridge users by pluggable transport on the metrics website: https://metrics.torproject.org/users.html#userstats-bridge-transport. (And there's an open ticket to add another graph for all pluggable transport users combined, so that should be available at some point.) In the meantime, if you want to play with the numbers yourself, here's the CSV file: https://metrics.torproject.org/stats.html#clients

But don't trust those numbers too much, because they may not be as accurate as you'd hope. See this FAQ at the bottom of the page: "Q: Why are there so few bridge users that are not using the default OR protocol or that are using IPv6? A: Very few bridges report data on transports or IP versions yet, and by default we consider requests to use the default OR protocol and IPv4. Once more bridges report these data, the numbers will become more accurate."

May 17, 2014

Permalink

Wasn't there a FF security update yesterday? is there gonna be a new TBB release/update soon?

May 18, 2014

Permalink

Just a thought: With almost complete data collection, “forever” storage, and quantum exploitation on the horizon, I think it would be only prudent to continually upgrade Tor encryption to the strongest available. And maybe audit its implementations.