This blog post is meant to generate a conversation about best practices for using cryptography and privacy by design to improve security and protect user data from well-resourced attackers and oppressive regimes.
The technology industry faces tremendous risks and challenges that it must defend itself against in the coming years. State-sponsored hacking and pressure for backdoors will both increase dramatically, even as soon as early 2017. Faltering diplomacy and faltering trade between the United States and other countries will also endanger the remaining deterrent against large-scale state-sponsored attacks.
Unfortunately, it is also likely that in the United States, current legal mechanisms, such as NSLs and secret FISA warrants, will continue to target the marginalized. This will include immigrants, Muslims, minorities, and even journalists who dare to report unfavorably about the status quo. History is full of examples of surveillance infrastructure being abused for political reasons.
Trust is the currency of the technology industry, and if it evaporates, so will the value of the industry itself. It is wise to get out ahead of this erosion of trust, which has already caused Americans to change online buying habits.
This trust comes from demonstrating the ability to properly handle user data in the face of extraordinary risk. The Tor Project has over a decade of experience managing risk from state and state-sized adversaries in many countries. We want to share this experience with the wider technology community, in the hopes that we can all build a better, safer world together. We believe that the future depends on transparency and openness about the strengths and weaknesses of the technology we build.
To that end, we decided to enumerate some general principles that we follow to design systems that are resistant to coercion, compromise, and single points of failure of all kinds, especially adversarial failure. We hope that these principles can be used to start a wider conversation about current best practices for data management and potential areas for improvement at major tech companies.
Ten Principles for User Protection
1. Do not rely on the law to protect systems or users.
2. Prepare policy commentary for quick response to crisis.
3. Only keep the user data that you currently need.
4. Give users full control over their data.
5. Allow pseudonymity and anonymity.
6. Encrypt data in transit and at rest.
7. Invest in cryptographic R&D to replace non-cryptographic systems.
8. Eliminate single points of security failure, even against coercion.
9. Favor open source and enable user freedom.
10. Practice transparency: share best practices, stand for ethics, and report abuse.
This is the principle from which the others flow. Whether it is foreign hackers, extra-legal entities like organized crime, or the abuse of power in one of the jurisdictions in which you operate, there are plenty of threats outside and beyond the reach of law that can cause harm to your users. It is wise not to assume that the legal structure will keep your users and their data safe from these threats. Only sound engineering and data management practices can do that.
It is common for technologists to take Principle 1 so far that they ignore the law, or at least ignore the political climate in which they operate. It is possible for the law and even for public opinion to turn against technology quickly, especially during a crisis where people do not have time to fully understand the effects of a particular policy on technology.
The technology industry should be prepared to counter bad policy recommendations with coherent arguments as soon as the crisis hits. This means spending time and devoting resources to testing the public's reaction to statements and arguments about policy in focus groups, with lobbyists, and in other demographic testing scenarios, so that we know what arguments will appeal to which audiences ahead of time. It also means having media outlets, talk show hosts, and other influential people ready to back up our position. It is critical to prepare early. When a situation becomes urgent, bad policy often gets implemented quickly, simply because "something must be done".
Excessive personally identifiable data retention is dangerous to users, especially the marginalized and the oppressed. Data that is retained is data that is at risk of compromise or future misuse. As Maciej Ceglowski suggests in his talk Haunted By Data, "First: Don't collect it. But if you have to collect it, don't store it! If you have to store it, don't keep it!"
With enough thought and the right tools, it is possible to engineer your way out of your ability to provide data about specific users, while still retaining the information that is valuable or essential to conduct your business. Examples of applications of this idea are Differential Privacy, PrivEx, the EFF's CryptoLog, and how Tor collects its user metrics. We will discuss this idea further in Principle 7; the research community is exploring many additional methods that could be supported and deployed.
For sensitive data that must be retained in a way that can be associated with an individual user, the ethical thing to do is to give users full control over that data. Users should have the ability to remove data that is collected about themselves, and this process should be easy. Users should be given interfaces that make it clear what type of data is collected about them and how, and they should be given easy ways to migrate, restrict, or remove this data if they wish.
Beyond issues with pseudonymity, the ability to anonymously access information via Tor and VPNs must also be protected and preserved. There is a disturbing trend for automated abuse detection systems to harshly penalize shared IP address infrastructure of all kinds, leading to loss of access.
The Tor Project is working with Cloudflare on both cryptographic and engineering-based solutions to enable Tor users to more easily access websites. We invite interested representatives from other tech companies to help us refine and standardize these solutions, and ensure that these solutions will work for them, too.
With recent policy changes in both the US and abroad, it is more important than ever to encrypt data in transit, so that it does not end up in the dragnet. This means more than just HTTPS. Even intra-datacenter communications should be protected by IPSec or VPN encryption.
As more of our data is encrypted in transit, requests for stored data will likely rise.
Companies can still be compelled to decrypt data that is encrypted with keys that they control. The only way to keep user data truly safe is to provide ways for users to encrypt that data with keys that only those users control.
A common argument against cryptographic solutions for privacy is that the loss of either features, usability, ad targeting, or analytics is in opposition to the business case for the product in question. We believe that this is because the funding for cryptography has not been focused on these needs. In the United States, much of the current cryptographic R&D funding comes from the US military. As Phillip Rogaway pointed out in Part 4 of his landmark paper, The Moral Character of Cryptographic Work, this has created a misalignment between what gets funded versus what is needed in the private sector to keep users' personal data safe in a usable way.
It would be a wise investment for companies that handle large amounts of user data to fund research into potential replacement systems that are cryptographically privacy preserving. It may be the case that a company can be both skillful and lucky enough to retain detailed records and avoid a data catastrophe for several years, but we do not believe it is possible to keep a perfect record forever.
The following are some areas that we think should be explored more thoroughly, in some cases with further research, and in other cases with engineering resources for actual implementations: Searchable encryption, Anonymous Credentials, Private Ad Delivery, Private Location Queries, Private Location Sharing, and PIR in general.
Well-designed cryptographic systems are extremely hard to compromise. Typically, the adversary looks for a way around the cryptography by either exploiting other code on the system, or by coercing one of the parties to divulge either key material or decrypted data. These attacks will naturally target the weakest point of the system - that is a single point of security failure where the fewest number of systems need to be compromised, and where the fewest number of people will notice. The proper engineering response is to ensure that multiple layers of security need to be broken for security to fail, and to ensure that security failure is visible and apparent to the largest possible number of people.
Sandboxing, modularization, vulnerability surface reduction, and least privilege are already established as best practices for improving software security. They also eliminate single points of failure. In combination, they force the adversary to compromise multiple hardened components before the system fails. Compiler hardening is another way to eliminate single points of failure in code bases. Even with memory unsafe languages, it is still possible for the compiler to add additional security layers. We believe that compiler hardening could use more attention from companies who contribute to projects like GCC and clang/llvm, so that the entire industry can benefit. In today's world, we all rely on the security of each other's software, sometimes indirectly, in order to do our work.
When security does fail, we want incidents to be publicly visible. Distributed systems and multi-party/multi-key authentication mechanisms are common ways to ensure this visibility. The Tor consensus protocol is a good example of a system that was deliberately designed such that multiple people must be simultaneously compromised or coerced before security will fail. Reproducible builds are another example of this design pattern. While these types of practices are useful when used internally in an organization, this type of design is more effective when it crosses organizational boundaries - so that multiple organizations need to be compromised to break the security of a system - and most effective when it also crosses cultural boundaries and legal jurisdictions.
We are particularly troubled by the trend towards the use of App Stores to distribute security software and security updates. When each user is personally identifiable to the software update system, that system becomes a perfect vector for backdoors. Globally visible audit logs like Google's General Transparency are one possible solution to this problem. Additionally, the anonymous credentials mentioned in Principle 7 provide a way to authenticate the ability to download an app without revealing the identity of the user, which would make it harder to target specific users with malicious updates.
The Four Software Freedoms are the ability to use, study, share, and improve software.
Open source software that provides these freedoms has many advantages when operating in a hostile environment. It is easier for experts to certify and verify security properties of the software; subtle backdoors are easier to find; and users are free to modify the software to remove any undesired operation.
The most widely accepted argument against backdoors is that they are technically impossible to deploy, because they compromise the security of the system if they are found. A secondary argument is that backdoors can be avoided by the use of alternative systems, or by their removal. Both of these arguments are stronger for open source than for closed source, precisely because of the Four Freedoms.
Unfortunately, not all software is open source. Even for proprietary software, the mechanisms by which we design our systems in order to prevent harm and abuse should be shared publicly in as much detail as possible, so that best practices can be reviewed and adopted more widely. For example, Apple is doing great work adopting cryptography for many of its products, but without specifications for how they are using techniques like differential privacy or iMessage encryption, it is hard to know what protections they are actually providing, if any.
Still, even when the details of their work are not public, the best engineers deeply believe that protecting their users is an ethical obligation, to the point of being prepared to publicly resign from their jobs rather than cause harm.
But, before we get to the point of resignation, it is important that we do our best to design systems that make abuse either impossible or evident. We should then share those designs, and responsibly report any instances of abuse. When abuse happens, inform affected organizations, and protect the information of individual users who were at risk, but make sure that users and the general public will hear about the issue with little delay.
Please Join Us
Ideally, this post will spark a conversation about best practices for data management and the deployment of cryptography in companies around the world.
We hope to use this conversation to generate a list of specific best practices that the industry is already undertaking, as well as to provide a set of specific recommendations based on these principles for companies with which we're most familiar, and whose products will have the greatest impact on users.
If you have specific suggestions, or would like to highlight the work of companies who are already implementing these principles, please mention them in the comments. If your company is already taking actions that are consistent with these principles, either write about that publicly, or contact me directly. We're interested in highlighting positive examples of specific best practices as well as instances where we can all improve, so that we all can work towards user safety and autonomy.
We would like to thank everyone at the Tor Project and the many members of the surrounding privacy and Internet freedom communities who provided review, editorial guidance, and suggestions for this post.
At The Tor Project, we make tools that help promote and protect the essential human rights of people everywhere. We have a set of guiding principles that make that possible, but for a long time, those principles were more or less unspoken. In order to ensure that project members build a Tor that reflects the commitment to our ideals, we've taken a cue from our friends at Debian and written the Tor Social Contract -- the set of principles that show who we are and why we make Tor.
Our social contract is a set of behaviors and goals: not just the promised results we want for our community, but the ways we seek to achieve them. We want to grow Tor by supporting and advancing these guidelines in the time we are working on Tor, while taking care not to undermine them in the rest of our time.
The principles can also be used to help recognize when people's actions or intents are hurting Tor. Some of these principles are established norms; things we've been doing every day for a long time; while others are more aspirational -- but all of them are values we want to live in public, and we hope they will make our future choices easier and more open. This social contract is one of several documents that define our community standards, so if you're looking for things that aren't here (e.g. something that might be in a code of conduct) bear in mind that they might exist, in a different document.
Social goals can be complex. If there is ever tension in the application of the following principles, we will always strive to place highest priority on the safety and freedom of any who would use the fruits of our endeavors. The social contract can also help us work through such tensions -- for example, there are times when we might have a need to use tools that are not completely open (contradicting point 2) but opening them would undermine our users' safety (contradicting point 6). Using such a tool should be weighed against how much it's needed to make our technologies usable (point 1). And if we do use such a tool, we must be honest about its capabilities and limits (point 5).
Tor is not just software, but a labor of love produced by an international community of people devoted to human rights. This social contract is a promise from our internal community to the rest of the world, affirming our commitment to our beliefs. We are excited to present it to you.
1. We advance human rights by creating and deploying usable anonymity and privacy technologies.
We believe that privacy, the free exchange of ideas, and access to information are essential to free societies. Through our community standards and the code we write, we provide tools that help all people protect and advance these rights.
2. Open and transparent research and tools are key to our success.
We are committed to transparency; therefore, everything we release is open and our development happens in the open. Whenever feasible, we will continue to make our source code, binaries, and claims about them open to independent verification. In the extremely rare cases where open development would undermine the security of our users, we will be especially vigilant in our peer review by project members.
3. Our tools are free to access, use, adapt, and distribute.
The more diverse our users, the less is implied about any person by simply being a Tor user. This diversity is a fundamental goal and we aim to create tools and services anyone can access and use. Someone's ability to pay for these tools or services should not be a determining factor in their ability to access and use them. Moreover, we do not restrict access to our tools unless access is superceded by our intent to make users more secure.
We expect the code and research we publish will be reviewed and improved by many different people, and that is only possible if everyone has the ability to use, copy, modify, and redistribute this information. We also design, build, and deploy our tools without collecting identifiable information about our users.
4. We make Tor and related technologies ubiquitous through advocacy and education.
We are not just people who build software, but ambassadors for online freedom. We want everybody in the world to understand that their human rights -- particularly their rights to free speech, freedom to access information, and privacy -- can be preserved when they use the Internet. We teach people how and why to use Tor and we are always working to make our tools both more secure and more usable, which is why we use our own tools and listen to user feedback. Our vision of a more free society will not be accomplished simply behind a computer screen, and so in addition to writing good code, we also prioritize community outreach and advocacy.
5. We are honest about the capabilities and limits of Tor and related technologies.
We never intentionally mislead our users nor misrepresent the capabilities of the tools, nor the potential risks associated with using them. Every user should be free to make an informed decision about whether they should use a particular tool and how they should use it. We are responsible for accurately reporting the state of our software, and we work diligently to keep our community informed through our various communication channels.
6. We will never intentionally harm our users.
We take seriously the trust our users have placed in us. Not only will we always do our best to write good code, but it is imperative that we resist any pressure from adversaries who want to harm our users. We will never implement front doors or back doors into our projects. In our commitment to transparency, we are honest when we make errors, and we communicate with our users about our plans to improve.
The Tor Project exists to provide privacy and anonymity for millions of people, including human rights defenders across the globe whose lives depend on it. The strong encryption built into our software is essential for their safety.
In an age when people have so little control over the information recorded about their lives, we believe that privacy is worth fighting for.
We therefore stand with Apple to defend strong encryption and to oppose government pressure to weaken it. We will never backdoor our software.
Our users face very serious threats. These users include bloggers reporting on drug violence in Latin America; dissidents in China, Russia, and the Middle East; police and military officers who use our software to keep themselves safe on the job; and LGBTI individuals who face persecution nearly everywhere. Even in Western societies, studies demonstrate that intelligence agencies such as the NSA are chilling dissent and silencing political discourse merely through the threat of pervasive surveillance.
For all of our users, their privacy is their security. And for all of them, that privacy depends upon the integrity of our software, and on strong cryptography. Any weakness introduced to help a particular government would inevitably be discovered and could be used against all of our users.
The Tor Project employs several mechanisms to ensure the security and integrity of our software. Our primary product, the Tor Browser, is fully open source. Moreover, anyone can obtain our source code and produce bit-for-bit identical copies of the programs we distribute using Reproducible Builds, eliminating the possibility of single points of compromise or coercion in our software build process. The Tor Browser downloads its software updates anonymously using the Tor network, and update requests contain no identifying information that could be used to deliver targeted malicious updates to specific users. These requests also use HTTPS encryption and pinned HTTPS certificates (a security mechanism that allows HTTPS websites to resist being impersonated by an attacker by specifying exact cryptographic keys for sites). Finally, the updates themselves are also protected by strong cryptography, in the form of package-level cryptographic signatures (the Tor Project signs the update files themselves). This use of multiple independent cryptographic mechanisms and independent keys reduces the risk of single points of failure.
The Tor Project has never received a legal demand to place a backdoor in its programs or source code, nor have we received any requests to hand over cryptographic signing material. This isn't surprising: we've been public about our "no backdoors, ever" stance, we've had clear public support from our friends at EFF and ACLU, and it's well-known that our open source engineering processes and distributed architecture make it hard to add a backdoor quietly.
From an engineering perspective, our code review and open source development processes make it likely that such a backdoor would be quickly discovered. We are also currently accelerating the development of a vulnerability-reporting reward program to encourage external software developers to look for and report any vulnerabilities that affect our primary software products.
The threats that Apple faces to hand over its cryptographic signing keys to the US government (or to sign alternate versions of its software for the US government) are no different than threats of force or compromise that any of our developers or our volunteer network operators may face from any actor, governmental or not. For this reason, regardless of the outcome of the Apple decision, we are exploring further ways to eliminate single points of failure, so that even if a government or a criminal obtains our cryptographic keys, our distributed network and its users would be able to detect this fact and report it to us as a security issue.
Like those at Apple, several of our developers have already stated that they would rather resign than honor any request to introduce a backdoor or vulnerability into our software that could be used to harm our users. We look forward to making an official public statement on this commitment as the situation unfolds. However, since requests for backdoors or cryptographic key material so closely resemble many other forms of security failure, we remain committed to researching and developing engineering solutions to further mitigate these risks, regardless of their origin.
We congratulate Apple on their commitment to the privacy and security of their users, and we admire their efforts to advance the debate over the right to privacy and security for all.
Journalists have been asking us for our thoughts about a recent pdf about a judge deciding that a defendant shouldn't get any more details about how the prosecutors decided to prosecute him. Here is the statement we wrote for them:
"We read with dismay the Western Washington District Court's Order on Defendant's Motion to Compel issued on February 23, 2016, in U.S. v. Farrell. The Court held "Tor users clearly lack a reasonable expectation of privacy in their IP addresses while using the Tor network." It is clear that the court does not understand how the Tor network works. The entire purpose of the network is to enable users to communicate privately and securely. While it is true that users "disclose information, including their IP addresses, to unknown individuals running Tor nodes," that information gets stripped from messages as they pass through Tor's private network pathways.
This separation of identity from routing is key to why the court needs to consider how exactly the attackers got this person's IP address. The problem is not simply that the attackers learned the user's IP address. The problem is that they appear to have also intercepted and tampered with the user's traffic elsewhere in the network, at a point where the traffic does not identify the user. They needed to attack both places in order to link the user to his destination. This separation is how Tor provides anonymity, and it is why the previous cases about IP addresses do not apply here.
The Tor network is secure and has only rarely been compromised. The Software Engineering Institute ("SEI") of Carnegie Mellon University (CMU) compromised the network in early 2014 by operating relays and tampering with user traffic. That vulnerability, like all other vulnerabilities, was patched as soon as we learned about it. The Tor network remains the best way for users to protect their privacy and security when communicating online."
The Tor Project has learned more about last year's attack by Carnegie Mellon researchers on the hidden service subsystem. Apparently these researchers were paid by the FBI to attack hidden services users in a broad sweep, and then sift through their data to find people whom they could accuse of crimes. We publicized the attack last year, along with the steps we took to slow down or stop such an attack in the future:
Here is the link to their (since withdrawn) submission to the Black Hat conference:
along with Ed Felten's analysis at the time:
We have been told that the payment to CMU was at least $1 million.
There is no indication yet that they had a warrant or any institutional oversight by Carnegie Mellon's Institutional Review Board. We think it's unlikely they could have gotten a valid warrant for CMU's attack as conducted, since it was not narrowly tailored to target criminals or criminal activity, but instead appears to have indiscriminately targeted many users at once.
Such action is a violation of our trust and basic guidelines for ethical research. We strongly support independent research on our software and network, but this attack crosses the crucial line between research and endangering innocent users.
This attack also sets a troubling precedent: Civil liberties are under attack if law enforcement believes it can circumvent the rules of evidence by outsourcing police work to universities. If academia uses "research" as a stalking horse for privacy invasion, the entire enterprise of security research will fall into disrepute. Legitimate privacy researchers study many online systems, including social networks — If this kind of FBI attack by university proxy is accepted, no one will have meaningful 4th Amendment protections online and everyone is at risk.
When we learned of this vulnerability last year, we patched it and published the information we had on our blog:
We teach law enforcement agents that they can use Tor to do their investigations ethically, and we support such use of Tor — but the mere veneer of a law enforcement investigation cannot justify wholesale invasion of people's privacy, and certainly cannot give it the color of "legitimate research".
Whatever academic security research should be in the 21st century, it certainly does not include "experiments" for pay that indiscriminately endanger strangers without their knowledge or consent.
1. Goals of this document.
- In general, to describe how to conduct responsible research on Tor and similar privacy tools.
- To develop guidelines for research activity that researchers can use to evaluate their proposed plan.
- Produce a (non-exhaustive) list of specific types of unacceptable activity.
- Develop a “due diligence” process for research that falls in the scope of “potentially dangerous” activities. This process can require some notification and feedback from the Tor network or other third parties.
2. General principles
Experimentation does not justify endangering people. Just as in medicine, there are experiments in privacy that can only be performed by creating an unacceptable degree of human harm. These experiments are not justified, any more than the gains to human knowledge would justify unethical medical research on human subjects.
Research on humans' data is human research. Over the last century, we have made enormous strides in what research we consider ethical to perform on people in other domains. For example, we have generally decided that it's ethically dubious to experiment on human subjects without their informed consent. We should make sure that privacy research is at least as ethical as research in other fields.
We should use our domain knowledge concerning privacy when assessing risks. Privacy researchers know that information which other fields consider non-invasive can be used to identify people, and we should take this knowledge into account when designing our research.
Finally, users and implementors must remember that "should not" does not imply "can not." Guidelines like these can serve to guide researchers who are genuinely concerned with doing the right thing and behaving ethically; they cannot restrain the unscrupulous or unethical. Against invasions like these, other mechanisms (like improved privacy software) are necessary.
3. Guidelines for research
- Only collect data that is acceptable to publish. If it would be inappropriate to share it with the world, it is invasive to collect it. In the case of encrypted or secret-shared data, it can be acceptable to assume that the keys or some shares are not published.
- Only collect as much data as is needed: practice data minimization.
- Whenever possible, use analysis techniques that do not require sensitive data, but which work on anonymized aggregates.
- Limit the granularity of the data. For example, "noise" (added data inaccuracies) should almost certainly be added. This will require a working statistical background, but helps to avoid harm to users.
- Make an explicit description of benefits and risks, and argue that the benefits outweigh the risks.
- In order to be sure that risks have been correctly identified, seek external review from domain experts. Frequently there are non-obvious risks.
- Consider auxiliary data when assessing the risk of your research. Data which is not damaging on its own can become dangerous when other data is also available. For example, data from exit traffic can be combined with entry traffic to deanonymize users.
- Respect people's own judgments concerning their privacy interests in their own data.
- It's a warning sign if you can't disclose details of your data collection in advance. If knowing about your study would cause your subjects to object to it, that's a good sign that you're doing something dubious.
- Use a test network when at all possible.
- If you can experiment either on a test network without real users, or on a live network, use the test network.
- If you can experiment either on your own traffic or on the traffic of strangers, use your own traffic.
- "It was easier that way" is not justification for using live user traffic over test network traffic.
4. Examples of unacceptable research activity
- It is not acceptable to run an HSDir, harvest onion addresses, and publish or connect to those onion addresses.
- Don't set up exit relays to sniff, or tamper with exit traffic. Some broad measurements (relative frequency of ports; large-grained volume) may be acceptable depending on risk/benefit tradeoffs; fine-grained measures are not.
- Don't set up relays that are deliberately dysfunctional (e.g., terminate connections to specific sites).