This blog post is meant to generate a conversation about best practices for using cryptography and privacy by design to improve security and protect user data from well-resourced attackers and oppressive regimes.
The technology industry faces tremendous risks and challenges that it must defend itself against in the coming years. State-sponsored hacking and pressure for backdoors will both increase dramatically, even as soon as early 2017. Faltering diplomacy and faltering trade between the United States and other countries will also endanger the remaining deterrent against large-scale state-sponsored attacks.
Unfortunately, it is also likely that in the United States, current legal mechanisms, such as NSLs and secret FISA warrants, will continue to target the marginalized. This will include immigrants, Muslims, minorities, and even journalists who dare to report unfavorably about the status quo. History is full of examples of surveillance infrastructure being abused for political reasons.
Trust is the currency of the technology industry, and if it evaporates, so will the value of the industry itself. It is wise to get out ahead of this erosion of trust, which has already caused Americans to change online buying habits.
This trust comes from demonstrating the ability to properly handle user data in the face of extraordinary risk. The Tor Project has over a decade of experience managing risk from state and state-sized adversaries in many countries. We want to share this experience with the wider technology community, in the hopes that we can all build a better, safer world together. We believe that the future depends on transparency and openness about the strengths and weaknesses of the technology we build.
To that end, we decided to enumerate some general principles that we follow to design systems that are resistant to coercion, compromise, and single points of failure of all kinds, especially adversarial failure. We hope that these principles can be used to start a wider conversation about current best practices for data management and potential areas for improvement at major tech companies.
Ten Principles for User Protection
1. Do not rely on the law to protect systems or users.
2. Prepare policy commentary for quick response to crisis.
3. Only keep the user data that you currently need.
4. Give users full control over their data.
5. Allow pseudonymity and anonymity.
6. Encrypt data in transit and at rest.
7. Invest in cryptographic R&D to replace non-cryptographic systems.
8. Eliminate single points of security failure, even against coercion.
9. Favor open source and enable user freedom.
10. Practice transparency: share best practices, stand for ethics, and report abuse.
This is the principle from which the others flow. Whether it is foreign hackers, extra-legal entities like organized crime, or the abuse of power in one of the jurisdictions in which you operate, there are plenty of threats outside and beyond the reach of law that can cause harm to your users. It is wise not to assume that the legal structure will keep your users and their data safe from these threats. Only sound engineering and data management practices can do that.
It is common for technologists to take Principle 1 so far that they ignore the law, or at least ignore the political climate in which they operate. It is possible for the law and even for public opinion to turn against technology quickly, especially during a crisis where people do not have time to fully understand the effects of a particular policy on technology.
The technology industry should be prepared to counter bad policy recommendations with coherent arguments as soon as the crisis hits. This means spending time and devoting resources to testing the public's reaction to statements and arguments about policy in focus groups, with lobbyists, and in other demographic testing scenarios, so that we know what arguments will appeal to which audiences ahead of time. It also means having media outlets, talk show hosts, and other influential people ready to back up our position. It is critical to prepare early. When a situation becomes urgent, bad policy often gets implemented quickly, simply because "something must be done".
Excessive personally identifiable data retention is dangerous to users, especially the marginalized and the oppressed. Data that is retained is data that is at risk of compromise or future misuse. As Maciej Ceglowski suggests in his talk Haunted By Data, "First: Don't collect it. But if you have to collect it, don't store it! If you have to store it, don't keep it!"
With enough thought and the right tools, it is possible to engineer your way out of your ability to provide data about specific users, while still retaining the information that is valuable or essential to conduct your business. Examples of applications of this idea are Differential Privacy, PrivEx, the EFF's CryptoLog, and how Tor collects its user metrics. We will discuss this idea further in Principle 7; the research community is exploring many additional methods that could be supported and deployed.
For sensitive data that must be retained in a way that can be associated with an individual user, the ethical thing to do is to give users full control over that data. Users should have the ability to remove data that is collected about themselves, and this process should be easy. Users should be given interfaces that make it clear what type of data is collected about them and how, and they should be given easy ways to migrate, restrict, or remove this data if they wish.
Beyond issues with pseudonymity, the ability to anonymously access information via Tor and VPNs must also be protected and preserved. There is a disturbing trend for automated abuse detection systems to harshly penalize shared IP address infrastructure of all kinds, leading to loss of access.
The Tor Project is working with Cloudflare on both cryptographic and engineering-based solutions to enable Tor users to more easily access websites. We invite interested representatives from other tech companies to help us refine and standardize these solutions, and ensure that these solutions will work for them, too.
With recent policy changes in both the US and abroad, it is more important than ever to encrypt data in transit, so that it does not end up in the dragnet. This means more than just HTTPS. Even intra-datacenter communications should be protected by IPSec or VPN encryption.
As more of our data is encrypted in transit, requests for stored data will likely rise.
Companies can still be compelled to decrypt data that is encrypted with keys that they control. The only way to keep user data truly safe is to provide ways for users to encrypt that data with keys that only those users control.
A common argument against cryptographic solutions for privacy is that the loss of either features, usability, ad targeting, or analytics is in opposition to the business case for the product in question. We believe that this is because the funding for cryptography has not been focused on these needs. In the United States, much of the current cryptographic R&D funding comes from the US military. As Phillip Rogaway pointed out in Part 4 of his landmark paper, The Moral Character of Cryptographic Work, this has created a misalignment between what gets funded versus what is needed in the private sector to keep users' personal data safe in a usable way.
It would be a wise investment for companies that handle large amounts of user data to fund research into potential replacement systems that are cryptographically privacy preserving. It may be the case that a company can be both skillful and lucky enough to retain detailed records and avoid a data catastrophe for several years, but we do not believe it is possible to keep a perfect record forever.
The following are some areas that we think should be explored more thoroughly, in some cases with further research, and in other cases with engineering resources for actual implementations: Searchable encryption, Anonymous Credentials, Private Ad Delivery, Private Location Queries, Private Location Sharing, and PIR in general.
Well-designed cryptographic systems are extremely hard to compromise. Typically, the adversary looks for a way around the cryptography by either exploiting other code on the system, or by coercing one of the parties to divulge either key material or decrypted data. These attacks will naturally target the weakest point of the system - that is a single point of security failure where the fewest number of systems need to be compromised, and where the fewest number of people will notice. The proper engineering response is to ensure that multiple layers of security need to be broken for security to fail, and to ensure that security failure is visible and apparent to the largest possible number of people.
Sandboxing, modularization, vulnerability surface reduction, and least privilege are already established as best practices for improving software security. They also eliminate single points of failure. In combination, they force the adversary to compromise multiple hardened components before the system fails. Compiler hardening is another way to eliminate single points of failure in code bases. Even with memory unsafe languages, it is still possible for the compiler to add additional security layers. We believe that compiler hardening could use more attention from companies who contribute to projects like GCC and clang/llvm, so that the entire industry can benefit. In today's world, we all rely on the security of each other's software, sometimes indirectly, in order to do our work.
When security does fail, we want incidents to be publicly visible. Distributed systems and multi-party/multi-key authentication mechanisms are common ways to ensure this visibility. The Tor consensus protocol is a good example of a system that was deliberately designed such that multiple people must be simultaneously compromised or coerced before security will fail. Reproducible builds are another example of this design pattern. While these types of practices are useful when used internally in an organization, this type of design is more effective when it crosses organizational boundaries - so that multiple organizations need to be compromised to break the security of a system - and most effective when it also crosses cultural boundaries and legal jurisdictions.
We are particularly troubled by the trend towards the use of App Stores to distribute security software and security updates. When each user is personally identifiable to the software update system, that system becomes a perfect vector for backdoors. Globally visible audit logs like Google's General Transparency are one possible solution to this problem. Additionally, the anonymous credentials mentioned in Principle 7 provide a way to authenticate the ability to download an app without revealing the identity of the user, which would make it harder to target specific users with malicious updates.
The Four Software Freedoms are the ability to use, study, share, and improve software.
Open source software that provides these freedoms has many advantages when operating in a hostile environment. It is easier for experts to certify and verify security properties of the software; subtle backdoors are easier to find; and users are free to modify the software to remove any undesired operation.
The most widely accepted argument against backdoors is that they are technically impossible to deploy, because they compromise the security of the system if they are found. A secondary argument is that backdoors can be avoided by the use of alternative systems, or by their removal. Both of these arguments are stronger for open source than for closed source, precisely because of the Four Freedoms.
Unfortunately, not all software is open source. Even for proprietary software, the mechanisms by which we design our systems in order to prevent harm and abuse should be shared publicly in as much detail as possible, so that best practices can be reviewed and adopted more widely. For example, Apple is doing great work adopting cryptography for many of its products, but without specifications for how they are using techniques like differential privacy or iMessage encryption, it is hard to know what protections they are actually providing, if any.
Still, even when the details of their work are not public, the best engineers deeply believe that protecting their users is an ethical obligation, to the point of being prepared to publicly resign from their jobs rather than cause harm.
But, before we get to the point of resignation, it is important that we do our best to design systems that make abuse either impossible or evident. We should then share those designs, and responsibly report any instances of abuse. When abuse happens, inform affected organizations, and protect the information of individual users who were at risk, but make sure that users and the general public will hear about the issue with little delay.
Please Join Us
Ideally, this post will spark a conversation about best practices for data management and the deployment of cryptography in companies around the world.
We hope to use this conversation to generate a list of specific best practices that the industry is already undertaking, as well as to provide a set of specific recommendations based on these principles for companies with which we're most familiar, and whose products will have the greatest impact on users.
If you have specific suggestions, or would like to highlight the work of companies who are already implementing these principles, please mention them in the comments. If your company is already taking actions that are consistent with these principles, either write about that publicly, or contact me directly. We're interested in highlighting positive examples of specific best practices as well as instances where we can all improve, so that we all can work towards user safety and autonomy.
We would like to thank everyone at the Tor Project and the many members of the surrounding privacy and Internet freedom communities who provided review, editorial guidance, and suggestions for this post.