Strength in Numbers: Measuring Diversity in the Tor Network

by irl | December 11, 2018

This post is one in a series of blogs to complement our 2018 crowdfunding campaign, Strength in Numbers. Anonymity loves company and we are all safer and stronger when we work together. Please contribute today, and your gift will be matched by Mozilla.

The Tor network is an ecosystem made up of thousands of volunteer-run relays distributed across the world that enable users to make private connections to services on the internet. Just as in nature, greater diversity among the members of the Tor network's ecosystem increases the sustainability of the Tor network, the biggest benefit of which is to ensure users are more secure and better protected from traffic correlation attacks.

At the Tor Metrics portal, we archive historical data about the Tor ecosystem, collect data from the public Tor network and related services, and assist in developing novel approaches to safe, privacy preserving data collection, including stats on network diversity.

With additional funding and support, Tor Metrics could provide even more important information about Tor network diversity than it already does. We created the graphs in this post through a one-off analysis for the purpose of demonstrating how valuable this information can be when visualized and easily accessible.

Diversity can be accomplished through a variety of operating systems used by different relays (e.g., Linux or FreeBSD), the computer architecture (e.g., PC or RaspberryPi), the geographical location (e.g., the country or continent), or the hosting provider. Although we only have a snapshot view of the latest statistics for provider or country, we do track the number of relays by their operating system over time.

In the past, the Tor Metrics portal did provide a graph for country statistics over time, but maintaining the graph became too expensive, due to development time, storage costs, computing resources, and ongoing maintenance.

If we had this back on our Metrics portal, we could always have these statistics for the top 5 countries running relays:

Relays by countries

We can see from this graph that since 2015, Germany has surpassed the United States with a greater number of relays. We also see that from 2014, France has seen a larger rise in relays compared to the Netherlands and Russia. This graph looks at absolute numbers of individual relays but, in fact, this is only part of the picture. When you use Tor, a path through the network is chosen for you by your client. Each relay in this path is chosen by its "consensus weight fraction” which is based on the bandwidth of each relay so that load is balanced across all the available relays. Low bandwidth relays have a lower probability of being chosen while high bandwidth relays have a higher probability. In the graph above, we treat all relays as equal, so a more accurate view would be based on the "consensus weight fraction" per country:

Consensus weight fraction per country

This graph looks at relative values as opposed to absolute values, so we no longer see the overall upward trends, but instead see over time how likely it is that a relay from each country will be chosen for use by clients. Looking at the most recent values, France now appears as more likely to be chosen than the United States even though we saw in the previous graph that the United States has more relays. This is because individual relays in the United States cannot handle as much traffic as relays in France. Even though the United States has over twice as many relays as the Netherlands, it has roughly the same total relay capacity and probability of selection. We also see that Russia drops out of the top 5 and is replaced by Sweden.

We can also go a step further. Different relays are better for different positions in a circuit. If a relay is quite stable, it is a good choice for the first relay, or "Guard," which will remain the first relay in a circuit over a long period. If a relay allows exit connections to the internet, then it is better used for those connections as the final relay, or "Exit," as opposed to using its bandwidth to only relay traffic to another.

Guard and Exit weight fraction by country

Interestingly, there is little difference between the probability that a relay will be selected for the "Guard" position vs. the "Exit" position for the latest values in the top 5 countries, but this has not always been the case.

Ongoing robust network measurement is essential in order to monitor the health and diversity of the Tor network, respond to censorship events, to adapt Tor Browser and other apps to respond to changing network conditions, and to validate changes that are made to the software that runs the Tor network.

As stated above, though, we had to remove this function from the Tor Metrics portal when it became prohibitively expensive. We would like to add these graphs back to enable easy tracking of geographical diversity for relays. If you would like to help bringing back these graphs, please consider donating to help support this work. A recurring donation would help us to keep these visualizations and other Tor Metrics services running for years to come.

Analyzing a live anonymity system must be performed with great care so that our users' privacy is not put at risk, and our numbers are valuable to the researchers, relay operators, and developers who help us keep the network strong.

donate-button

Plus, every donation now through the end of 2018 will be matched by Mozilla. There is strength in numbers. Join us.

Comments

Please note that the comment area below has been archived.

December 11, 2018

Permalink

Fantastic progress and an encouraging report!

I am one of the users who first tried to draw attention to the need for diversity many years ago so it has been very rewarding to see the effort people have been putting into this very important project!

December 12, 2018

Permalink

A day in the life of top-European Torcircuits

Germany, Germany, Germany
Germany, France, Netherlands.
Germany, Netherlands, France
Germany, France, France
Germany, Netherlands, Netherlands
Germany, Germany, Germany
Germany, France, France
Germany, Netherlands, Netherlands
Germany, Germany, Germany

And maybe, maybe after renewing circuits more times, maybe finally another country in sight.
On certain websites (webmail services) you will almost not succeed, it is just stuck to this nabor friends country combinations.

Just coincidences, over and over again or an example of good 'nabors' with a very good international tornetwork cooperation?
Looking at these network combinations everyday it is quite common (like the example above) to have all three in a row, or a row with 2 of this three parties.
Or, when renewing network exchanging place for the exit and middle node.

Would it be possible to influence the redirections to another node?
If so, then we have a secret service surveillance network within tornetwork based on nodes that are owned by these services.
That is interesting metadata (for them).

Wouldn't it a better idea to spread these connections.
Divide them over continents, never using an entry node in the country you are in, no nabor country connections.
Borders are disappearing, at least in the digital world, big chance that you are communicating over 3 nodes in a row that are owned by cooperating owners.

Network related question, ....
anyone else got hacked last saturday when visiting lemonde.fr website with torbrowser?

Browser crash after enabling javascripts and torbrowser left broken behind, startup tornetwork function not working anymore, totally kaputt.
Never happened in many years. Looked like maybe a javascript, and svg based attack?
Only a final complete new reinstall of the browser was a solution, visiting that .fr website without javascripts enabled was possible finally.

Chapeau to the hacking guys.
Next time, try to chapeau to Rousseau again!
I was just trying to read the news, a quite important basic (but apparently disappearing) right.

December 14, 2018

Permalink

Everyone is talking about "diversity" yet the vast majority of nodes are from a few EU countries and the US which closely cooperate and can easily share surveillance data.

I think this is a serious problem.
What is the reason for the complete lack of nodes from Asia, South America or even Africa?
Many countries there have very robust IT infrastructure yet to me it feels as if the diversity of the tor network has actually decreased over the years as capacity has increased.
Let's face it, most nodes are not run by your Average Joe individual but by organizations nowadays which is not astonishing given the bandwidh they provide. Still I'd say something similar should be possible in different regions at least to some degree. You hold Tor summits around the globe so maybe focusing on gaining relay operators outside of EU / US would not be the worst idea?

One issue is that the servers that measure the bandwidth a relay can provide are located in Europe and the US and so relays running further away tend to be assigned a lower consensus weight and used less. Another issue is the lack of existing communities in some locations to help get relays running.

We are currently looking at improving the bandwidth measurement system by replacing the existing software with new software and we do hold relay operator meetups co-located with events that a few Tor people are attending, such as our recent meeting in Mexico.