How Bandwidth Scanners Monitor The Tor Network
The Tor network is comprised of thousands of volunteer-run relays around the world, and millions of people rely on it for privacy and freedom online everyday. To monitor the Tor network's performance, detect attacks on it, and better distribute load across the network, we employ what we call Tor bandwidth scanners. The bandwidth scanners are run by the directory authorities (dirauths).
Tor relays report their own bandwidth based on the traffic they have sent and received. But this reported bandwidth is not verified by other relays. Bandwidth scanners help verify relay bandwidths. They also provide some initial traffic to new relays, so those relays can report a useful amount of bandwidth.
Torflow was the first Tor bandwidth scanner, started in 2011. Over time, it has become more difficult to install and to maintain, because the libraries it was built with are no longer maintained. In 2018, we started to develop "Simple Bandwidth Scanner" (sbws) using more modern and maintained libraries. Right now, out of nine dirauths, six are bandwidth authorities, which means they run bandwidth scanners. (There is also one bridge authority. It doesn't do bandwidth scanning.)
sbws chooses two relays, and builds a path between them. One relay is the target of the sbws measurement. The other relay is a random relay that's faster than the target relay. The scanner downloads data from a web server through this path between the relays. It measures the bandwidth as the amount of data downloaded and the time it took. Every hour, the scanner filters invalid measurements, aggregates them, and scales the valid ones. Finally, it writes a bandwidth file with all the relays' bandwidth. The directory authorities read this file and vote on the relays' bandwidth.
Torflow divides the network into partitions depending on relay bandwidth. So some relays would end up stuck in a low-bandwidth partition. Unlike Torflow, sbws does not divide relays into partitions, so relays can't get stuck in a slow partition.
To reach a consensus about a relay's bandwidth, as reported by the scanners, tor uses the median of at least three of their votes. Right now, there is only one directory authority running sbws, and five run Torflow. We plan to have three authorities running sbws by the end of April, so we'll start to see the effects of the changes to sbws soon. If all goes well, we'll eventually want all dirauths to switch from Torflow to sbws.
The latest version of sbws reports all relays that it has seen, including ones that it could not measure. This will help us to diagnose issues and anomalies in the relays, the network, and the software itself. It will also help to answer relay operators questions about their relay consensus weight and bandwidth.
In the next few months, we will start archiving the bandwidth files from sbws and Torflow using CollecTor. Once the directory authorities start running Tor version 0.4.0.4-alpha or later, CollecTor can ask them for their bandwidth files. This will increase transparency while preserving anonymity, since the reported bandwidth values are aggregated from multiple measurements.
Before this change, it was possible to know the bandwidths that were reported by a scanner in a vote, but we could not know which bandwidth file corresponded to which vote. The bandwidth file headers can also help to debug bandwidth file and vote issues.
We wrote a specification for the bandwidth file format. This way, others can develop parsers to obtain metrics, or develop compatible bandwidth scanners.
There are a still several engineering and research improvements that can be done.
So far, sbws scales the raw bandwidth measurements in the same way as Torflow. Scaling is needed in order to balance the load in the network.
The measurement and scaling of bandwidth weights should achieve an equilibrium goal. For instance, the user should experience consistent performance, regardless of the relays that their tor client has randomly chosen.
sbws is decentralized in the sense that there will be several instances of it running, but each of these instances is a single point of failure. Any shared servers or DNS infrastructure are also single points of failure.
Tor needs a minimum of three bandwidth authorities, and we have six bandwidth authorities running right now. We hope that sbws will be easy for directory authority operators to deploy, so we might have seven or eight authorities running a bandwidth scanner in the future.
sbws is still vulnerable to denial of services attacks and traffic manipulation, as explained in the 2018 research post.
We would be grateful to anyone who could help to improve the scanner. We encourage you to open tickets, base your implementations on sbws, develop a compatible external application programming interface, or extend the existing bandwidth file format. If implementations use similar code and data formats, it will be easier for Tor to use them, maintain them, and generate metrics from them.
If you want to dive deeper into some of the current bandwidth data, take a look at:
- How long it takes to download files over Tor.
- Total consensus bandwidth across directory authorities.
- Total available bandwidth
- Consensus health graphs
- Consensus health overlap
Before version 3 of the Tor directory protocol, each of the directory authorities was indeed a single point of failure. Now though they vote on the state of the network which means that adding directory authorities can increase trust as opposed to introducing more single points of failure. Even when all the directory authorities are offline, the network can still survive for a time using other relays as caches of the network status. We would like to increase the number of bridge authorities in the future and work is ongoing for allowing that, but the bridge authority does not need to be online for users to use bridges, only to learn about new bridges if they do not already know of any, or if the ones they are using are blocked.