Strength in Numbers: Measuring Diversity in the Tor Network
This post is one in a series of blogs to complement our 2018 crowdfunding campaign, Strength in Numbers. Anonymity loves company and we are all safer and stronger when we work together. Please contribute today, and your gift will be matched by Mozilla.
The Tor network is an ecosystem made up of thousands of volunteer-run relays distributed across the world that enable users to make private connections to services on the internet. Just as in nature, greater diversity among the members of the Tor network's ecosystem increases the sustainability of the Tor network, the biggest benefit of which is to ensure users are more secure and better protected from traffic correlation attacks.
At the Tor Metrics portal, we archive historical data about the Tor ecosystem, collect data from the public Tor network and related services, and assist in developing novel approaches to safe, privacy preserving data collection, including stats on network diversity.
With additional funding and support, Tor Metrics could provide even more important information about Tor network diversity than it already does. We created the graphs in this post through a one-off analysis for the purpose of demonstrating how valuable this information can be when visualized and easily accessible.
Diversity can be accomplished through a variety of operating systems used by different relays (e.g., Linux or FreeBSD), the computer architecture (e.g., PC or RaspberryPi), the geographical location (e.g., the country or continent), or the hosting provider. Although we only have a snapshot view of the latest statistics for provider or country, we do track the number of relays by their operating system over time.
In the past, the Tor Metrics portal did provide a graph for country statistics over time, but maintaining the graph became too expensive, due to development time, storage costs, computing resources, and ongoing maintenance.
If we had this back on our Metrics portal, we could always have these statistics for the top 5 countries running relays:
We can see from this graph that since 2015, Germany has surpassed the United States with a greater number of relays. We also see that from 2014, France has seen a larger rise in relays compared to the Netherlands and Russia. This graph looks at absolute numbers of individual relays but, in fact, this is only part of the picture. When you use Tor, a path through the network is chosen for you by your client. Each relay in this path is chosen by its "consensus weight fraction” which is based on the bandwidth of each relay so that load is balanced across all the available relays. Low bandwidth relays have a lower probability of being chosen while high bandwidth relays have a higher probability. In the graph above, we treat all relays as equal, so a more accurate view would be based on the "consensus weight fraction" per country:
This graph looks at relative values as opposed to absolute values, so we no longer see the overall upward trends, but instead see over time how likely it is that a relay from each country will be chosen for use by clients. Looking at the most recent values, France now appears as more likely to be chosen than the United States even though we saw in the previous graph that the United States has more relays. This is because individual relays in the United States cannot handle as much traffic as relays in France. Even though the United States has over twice as many relays as the Netherlands, it has roughly the same total relay capacity and probability of selection. We also see that Russia drops out of the top 5 and is replaced by Sweden.
We can also go a step further. Different relays are better for different positions in a circuit. If a relay is quite stable, it is a good choice for the first relay, or "Guard," which will remain the first relay in a circuit over a long period. If a relay allows exit connections to the internet, then it is better used for those connections as the final relay, or "Exit," as opposed to using its bandwidth to only relay traffic to another.
Interestingly, there is little difference between the probability that a relay will be selected for the "Guard" position vs. the "Exit" position for the latest values in the top 5 countries, but this has not always been the case.
Ongoing robust network measurement is essential in order to monitor the health and diversity of the Tor network, respond to censorship events, to adapt Tor Browser and other apps to respond to changing network conditions, and to validate changes that are made to the software that runs the Tor network.
As stated above, though, we had to remove this function from the Tor Metrics portal when it became prohibitively expensive. We would like to add these graphs back to enable easy tracking of geographical diversity for relays. If you would like to help bringing back these graphs, please consider donating to help support this work. A recurring donation would help us to keep these visualizations and other Tor Metrics services running for years to come.
Analyzing a live anonymity system must be performed with great care so that our users' privacy is not put at risk, and our numbers are valuable to the researchers, relay operators, and developers who help us keep the network strong.
Plus, every donation now through the end of 2018 will be matched by Mozilla. There is strength in numbers. Join us.
Fantastic progress and an encouraging report!
I am one of the users who first tried to draw attention to the need for diversity many years ago so it has been very rewarding to see the effort people have been putting into this very important project!