Ethical Tor Research: Guidelines
[Edit: this blog post was a preliminary version of our ideas, and you should look at the Tor Research Safety Board page now.]
1. Goals of this document.
- In general, to describe how to conduct responsible research on Tor and similar privacy tools.
- To develop guidelines for research activity that researchers can use to evaluate their proposed plan.
- Produce a (non-exhaustive) list of specific types of unacceptable activity.
- Develop a “due diligence” process for research that falls in the scope of “potentially dangerous” activities. This process can require some notification and feedback from the Tor network or other third parties.
2. General principles
Experimentation does not justify endangering people. Just as in medicine, there are experiments in privacy that can only be performed by creating an unacceptable degree of human harm. These experiments are not justified, any more than the gains to human knowledge would justify unethical medical research on human subjects.
Research on humans' data is human research. Over the last century, we have made enormous strides in what research we consider ethical to perform on people in other domains. For example, we have generally decided that it's ethically dubious to experiment on human subjects without their informed consent. We should make sure that privacy research is at least as ethical as research in other fields.
We should use our domain knowledge concerning privacy when assessing risks. Privacy researchers know that information which other fields consider non-invasive can be used to identify people, and we should take this knowledge into account when designing our research.
Finally, users and implementors must remember that "should not" does not imply "can not." Guidelines like these can serve to guide researchers who are genuinely concerned with doing the right thing and behaving ethically; they cannot restrain the unscrupulous or unethical. Against invasions like these, other mechanisms (like improved privacy software) are necessary.
3. Guidelines for research
- Only collect data that is acceptable to publish. If it would be inappropriate to share it with the world, it is invasive to collect it. In the case of encrypted or secret-shared data, it can be acceptable to assume that the keys or some shares are not published.
- Only collect as much data as is needed: practice data minimization.
- Whenever possible, use analysis techniques that do not require sensitive data, but which work on anonymized aggregates.
- Limit the granularity of the data. For example, "noise" (added data inaccuracies) should almost certainly be added. This will require a working statistical background, but helps to avoid harm to users.
- Make an explicit description of benefits and risks, and argue that the benefits outweigh the risks.
- In order to be sure that risks have been correctly identified, seek external review from domain experts. Frequently there are non-obvious risks.
- Consider auxiliary data when assessing the risk of your research. Data which is not damaging on its own can become dangerous when other data is also available. For example, data from exit traffic can be combined with entry traffic to deanonymize users.
- Respect people's own judgments concerning their privacy interests in their own data.
- It's a warning sign if you can't disclose details of your data collection in advance. If knowing about your study would cause your subjects to object to it, that's a good sign that you're doing something dubious.
- Use a test network when at all possible.
- If you can experiment either on a test network without real users, or on a live network, use the test network.
- If you can experiment either on your own traffic or on the traffic of strangers, use your own traffic.
- "It was easier that way" is not justification for using live user traffic over test network traffic.
4. Examples of unacceptable research activity
- It is not acceptable to run an HSDir, harvest onion addresses, and publish or connect to those onion addresses.
- Don't set up exit relays to sniff, or tamper with exit traffic. Some broad measurements (relative frequency of ports; large-grained volume) may be acceptable depending on risk/benefit tradeoffs; fine-grained measures are not.
- Don't set up relays that are deliberately dysfunctional (e.g., terminate connections to specific sites).