How to Do Effective and Impactful Tor Research
As we mentioned in our previous post about Tor research topics, Tor greatly benefits from the research community. When researchers work closely with the design and development of deployed systems, this not only results in better research, but also better systems. For project maintainers, research that identifies vulnerabilities, creates new solutions to existing problems, and verifies proposed designs helps improve projects and make them safer for end users. TLS 1.3 is one recent example of where a symbiotic research/practitioner relationship has improved the protocol's design and safety.
However it is all too common that good research ideas don't make their way into practice. Within Tor, we have found that integrating new research findings isn't seamless or predictable, and good ideas are often lost or deemed incompatible without significantly more analysis and research.
The purpose of this post is to discuss what good research needs to do in order to ensure it has the best chance of being adopted by Tor or any other large software project.
We have structured this post in terms of an ordered list of goals for research. Each successive goal is more difficult to accomplish than the previous one. At the end of this post, we will look at a positive example of excellent research that successfully accomplished all of these goals and give overall takeaways.
So, here is your list of goals, in order of increasing difficulty, when conducting relevant research:
- Motivate your work!
- Standardize your results!
- Ensure others can reproduce your results easily!
- Analyze all the consequences!
- Consider current and future design compatibility!
- Write quality code!
- Be excellent to each other!
Experienced researchers will immediately notice that a lot of published research, even at top-tier venues, only makes it two steps deep into this list. The farther you get down this list with your research topic, the more likely it is to ultimately get adopted or otherwise affect change in Tor (or any other privacy or software project that you care about).
#1. Motivate your work!
Your minimum responsibility as a researcher is to motivate why your work is novel and relevant.
One positive example of where research was successfully deployed in the wider Tor network was with NTor, the Authenticated Key Exchange (AKE) that is currently in use in Tor. The NTor paper identified vulnerabilities in the competing proposed AKEs, one of which which allowed for impersonation attacks on behalf of a server connecting to a client and proposed a solution which safely enables both mutual and one-way authentication for use in relay-to-relay and client-to-relay authentication.
The NTor work example illustrates a key point as to why some research is adopted quickly and some is not. Not only did it improve on Tor's aging RSA-based handshake with an ECDH variant, but it also showed weaknesses in previous proposed handshakes. It went still further to provide performance data and security proofs, further motivating its adoption. The takeaway is that it is important to motivate your research as much as you can. This includes showing how it is better than the previous approach, how the old approach is broken/insecure, and why the new approach would be an easy replacement or transition.
For many attack papers, the attack itself can serve as the motivation, the experiment, and the novel result. It is unfortunately very easy to stop research as soon as you have an attack on a popular system like Tor, and have this result get accepted at a top-tier conference. And in some cases, that can be just fine. It is very hard to come up with a complete defense to a novel attack by yourself, especially if solving it would require systemic changes to the system you are attacking.
However, you do still have responsibilities in this case. If you are conducting attacks on Tor, please read our ethical research information, and ideally contact our research safety board, and ensure you are conducting your research safely, and following responsible disclosure.
You also have a rare opportunity in the attack case, which is rarely available in other situations, via the following goal.
#2. Standardize your results!
The most highly cited research typically sets the frame for an entire topic area. It establishes the concepts, experiments, metrics, reference datasets, and sometimes even reference implementations. If you are not the first paper in your area of interest, and no such standardization exists, it is often a publication-worthy effort simply to review prior work while re-evaluating it under a rigorous standardized experimentation and measurement model. However, doing this takes extreme care, especially if you have your own work in mind. If there is no prior work that has standardized metrics or methods for evaluation, it is too easy to fall into the trap of using a custom (particularly small-sized) dataset or other tailored evaluation mechanism that can cause flaws with your research to be swept under the rug. These flaws may be missed in peer review, and still allow you to get a publication, but they will not withstand the test of time.
The NTor work was also a good example of doing this right. The authors first provided proofs of the previous Tor handshake under limited circumstances, and then produced a second paper which used the similar formal methodology to evaluate competing proposals, as well as its own design. Since the other work did not come with strong proofs, NTor not only ended up becoming our handshake of choice, but also set the standard for developing similar cryptographic handshakes in the future.
If you are the first work in an area, or if you have a particularly novel attack on Tor, you should consider what the solution requirements are for a proper fix. It is often difficult to come up with a complete solution, especially for novel attacks and research areas. But can you come up with a partial, straw-man solution and find some way to measure its efficacy in a standardized way? This is your chance to be the seminal research paper that gets reams of citations because it not only discovered a new attack, but because it also systematized and standardized how all future defenses were measured. If you just write the first attack paper in an area, it will eventually be forgotten once other attacks come along. If you set the standard by which defenses are measured, your work will be cited by all subsequent attack and defense papers. Don't waste this opportunity.
Obviously this effort requires standardized datasets, test scenarios, evaluation criteria, and reproducible experiments. Which brings us to our next point.
#3. Ensure others can reproduce your results easily!
In nearly all fields, academic research is facing a crisis of scientific reproducibility. The "publish or perish" model encourages researchers to cut as many corners as they can in order to generate as many publications as they can. It requires a lot of work to provide enough data and implementation detail to ensure the independent repeatability of the results of a paper. This means that if this standard is not enforced by the publication process, ease of scientific reproducibility will inevitably fall to the wayside. Those that short-cut this standard can out-publish those that do not, all other things being equal. The rational-actor result of this situation is exactly what we have now: a crisis of scientific reproducibility.
Reproducibility can be impeded by lack of implementation source code, private or ephemeral datasets, black-box testing of rapidly-evolving systems, poor documentation, lack of explicitly-defined assumptions, and insufficient research rigor. Insufficient research rigor becomes an issue for Tor specifically as outcomes can often change when exposed to the size and complexity of the Tor network, as discrepancies due to small data sets or insufficiently controlled experiments can cause wide variance between research results and production results.
Furthermore, implicit assumptions that are not well-documented are also problematic and often hard to discover until deep in the implementation cycle. For example, developing an experiment where nodes are assumed to always be available without explicitly documenting this assumption can be difficult to back out from when attempting to produce similar results on a live network where nodes can intermittently fail at any moment.
To improve on this, we recommend significant peer review to assess if all necessary information and artifacts are provided so that an independent party can later reproduce the paper's results. When reviewing research, on Tor or any production system, explicitly assess whether the results could be independently reproduced and verified, and at what cost and engineering difficulty. Factors that would help in this are artifacts and information such as code samples, test vectors, relevant datasets, and explicitly-identified dependencies such as external libraries and even operating systems where tests were performed.
We also recommend that these artifacts remain public after the research is complete and the paper is published. Tor's development planning cycle often is made several months or even years out, so ensuring that sufficient information from the time of the research, such as reference implementations, remain available long after researchers have moved is important.
We will also be following up in the next few weeks with a "peer review checklist" that we hope researchers will use when publishing/reviewing research on Tor to ensure that sufficient analysis has been performed.
#4. Analyze all the consequences!
When proposing changes to Tor, either to core network protocols, how users interact with various interfaces, or how to protect against certain attacks, it is important that these proposed changes are evaluated holistically and tradeoffs are explicitly documented. Failure to do so will move the cost of this analysis onto the core development team, meaning that many questions will be left unanswered. This often results in an "analysis paralysis" where we lack sufficient information to weigh certain tradeoffs against others.
The following is an non-exhaustive list of tradeoffs relevant to Tor:
- Security/Tor’s threat model
- Performance (per relay and for the entire network)
- Scalability (does this scheme scale to thousands of nodes)
- User experience
- Network load
- Engineering complexity (will it take the entire team a year to build)
- Engineering maintainability (will this require a lot of custom hard-to-maintain functionality?)
- Protocol complexity
- Failure cases (nodes should be able to intermittently fail, etc)
- Corner cases
- Extreme cases (DDOS, sudden influx of new clients, the law of truly large numbers, etc)
- Compatibility with upgrades/downgrades and multi-version deployments
- Related transition plans that consider things like fragmenting the userbase and potentially reducing anonymity as a result
- Consideration with how this planned change interacts with other future planned changes to Tor
- Continued support to Tor's user base, which includes thousands of relays, millions of users, and wide application usage, from file sharing to mobile browsing.
While this list is not complete, researchers can improve analysis by considering tradeoffs holistically. This means not just considering how a proposed solution solves a previously-unaddressed threat such as that of a global network adversary, but also addressing how a proposed solution fares against other considerations such as performance, scalability, and the requirement of zero network downtime. We ask that researchers explicitly weigh these considerations against each other and holistically evaluate the question of whether the proposed solution's increased positive properties outweighs any downsides.
One example of where this tradeoff can get particularly tricky is in the traffic analysis arena. As a simple example: if we switch to using two guard nodes from one, and use traffic splitting across these two guards, we get what looks like a number of benefits against traffic analysis. If the split patterns are unpredictable to an outside observer of only one of these paths, website traffic fingerprinting and other traffic analysis attacks should become much harder. But, how does this trade off against adding an additional observer for the second guard? How do we even measure that tradeoff?
Another example is in the datagram transport arena. Obviously switching to a datagram transport would mitigate congestion attacks, OOM-based DoS, and tons of other performance-related issues. But without considering the effects of anonymity via vectors like drop-based side channels and the switch to measurably low (and bounded) latencies, as well as how to adapt our cryptography for a lossy datagram model, the Tor Project has no way to decide if it should begin work on the idea.
If this point hasn't convinced you that it would be wise to consult us about any extensive modifications that you would like to see adopted, the next one will :).
#5. Consider current and future design compatibility!
It is not uncommon that research findings propose solutions which are a fundamental departure from how Tor is currently designed and implemented. One hypothetical example of such a fundamental departure is that of Tor moving from onion routing to mixnet protocols. Doing so would fundamentally change core Tor protocols, such as requiring an entirely different design and implementation of Tor's data transport protocols. Furthermore, the service that Tor offers to end users would also change radically from supporting low-latency applications such as streaming video to only higher-latency use cases.
While proposing radical changes to Tor isn't itself an issue, every change to Tor has a cost associated with it. The higher this cost to moving to a new design, and the more opaque the framework for analyzing other costs/benefits, the more difficult it is for us to make this change.
Examples of radical changes to Tor include:
- Significant departure from Tor's current/future architecture
- Changes to the services that Tor is able to provide (low latency, protection against local adversaries, etc)
- Incompatibility with day-to-day operations of the Tor network, such as providing network connectivity to users at a global scale
One option for research that identifies a fundamental yet highly beneficial change to Tor would be to offer a "transition plan" that identifies how Tor could evolve incrementally to this larger design. By identifying incremental changes and even a minimum viable product, this creates a path forward and enables implementors to introduce changes on a gradual basis, allowing for improved testability and analysis. Furthermore, incremental changes help provide backwards compatibility and an upgrade/downgrade path. In a distributed network such as Tor, it is difficult to roll breaking changes across the network. Therefore, providing a smooth transition path between current and future state will help ease friction to evolve to a new improved design while providing service on a live global network.
Even changes to Tor that are not in themselves extreme can end up colliding with (or simply needing to interact with) future plans to change the Tor protocol. For this reason, it is a good idea to keep an eye on our pending design proposals, and chat with us on IRC about the status of any that may be related to your work.
#6. Write quality code!
If you want to go above and beyond, and help write a production-quality implementation of your idea so as to ensure it is rapidly adopted by Tor, then it is also important to pay attention to the engineering issues involved in merging your proposed changes.
For any project maintainer, the bigger and more complex a code change is, the harder it is to review and test before merging. This is also the case with research implementations. Even for well-motivated, well-analyzed, well-designed, easily reproducible work that is more-or-less production ready, when we receive a 10,000 line patch, it results in some possibly-unanswerable questions such as:
- How to sufficiently review the patch - is there a succinct Tor proposal describing your changes to check against; are the commits well structured, etc
- How to know the patch does what is expected - can we reproduce your results on the Tor test network (and the live network) easily?
- How to know that this patch performs at scale?
- How to reconcile the patch with more recent changes to the codebase?
Overall, we want to emphasize that receiving a 10,000 line patch after a successful research implementation isn't necessarily a guarantee that this patch would be merged. This is due to the challenge to reviewing this code with little context or if the implementation quality is not high.
Concrete examples of things that contribute to merge difficulty include large patches without sufficient documentation or tests, code drift (not keeping the branch up to date with the project mainline branch), interwoven and cyclic dependencies, and fragile or brittle code that is error-prone.
While in the near future we will be publishing a guide to submitting large patches to Tor, some recommendations in the meantime include:
- Have a transition as well as a revert mechanism - can we turn your feature on and off if we want to evaluate it on the live network, or if it causes problems?
- Write tests! These serve to both document and prove your code functions as expected
- Write code that is maintainable for the next 5-10 years
- Fight code drift- continuously rebase your branch to the project’s master branch.
#7. Be excellent to each other!
As software is built by people, it is important to acknowledge the human element in the process of integrating research findings into practice. As we know that the "publish or perish" model required of researchers often means that researchers don't have a lot of time after a publication to see through the outcome, post-research but pre-implementation tasks tend to fall through the cracks.
For example, lack of domain knowledge within the development team can make the "throw over the wall" process of publishing a paper in isolation difficult to see through to a successful implementation. All too often, questions or concerns by the team will go unaddressed as lines of communication aren't clear. This becomes worse if the paper lacks important details or doesn't consider edge or extreme cases.
While seemingly simple, we recommend that researchers work closely with us and develop clear lines of communication. It can be helpful to have a single champion within the Tor development team for a proposed change, where that person will both understand the research thoroughly and will be able to serve as the conduit for communication. Attending in-person meetings and holding working sessions with the team is both welcomed and useful to transfer knowledge.
Positive example and overall takeaways
Besides NTor, another recent example of good research making its way into production in Tor is that of recent work on "Kernal Informed Socket Transport" (KIST). KIST uses feedback from kernel congestion to inform how much data to transmit and changed priority scheduling in a Tor agent from per-socket to across all circuits. Several aspects which contributed to the successful integration of KIST include 1) KIST implementors working closely with Tor's development team, 2) KIST went through several rounds of iteration for improved testing and performance, and 3) Sufficient development hours were dedicated for a complete implementation and code revision.
However, while KIST and NTor are two positive examples, we hope to have many more. We look forward to working with researchers who are dedicated to ensuring that their research has holistic and rigorous analysis, thorough consideration for trade-offs, and clearly-documented artifacts to ensure ease of reproducibility. On the engineering and implementation side, we hope to see more patches with maintainable code, sufficient documentation, and sufficient knowledge transfer between researchers and project maintainers.