At the beginning of July, a few of us gathered in Washington DC for the first hidden service hackfest. Our crew was comprised of core Tor developers and researchers who were in the area; mostly attendees of PETS. The aim was to push hidden service development forward and swiftly arrive at decisions that were too tiresome and complex to make over e-mail.
Since we were mostly technical folks, we composed technical proposals and prioritized development, and spent less time with organizational or funding tasks. Here is a snapshot of the work that we did during those 5 days:
- The first day, we discussed current open topics on hidden services and tasks we should be doing in the short-to-medium-term future.
Our list of tasks included marketing and fundraising ones like "Re-branding hidden services" and "Launch crowdfunding campaign", but we spent most of the first day discussing Proposal 224 aka the "Next Generation Hidden Services" project.
- Proposal 224 is our master plan for improving hidden services in fundamental ways: The new system will be faster, use better cryptography, have more secure onion addresses, and offer advanced security properties like improved DoS resistance and keeping identity keys offline. It's heavy engineering work, and we are still fine-tuning the design, so implementation has not started yet.
While discussing how we would implement the system, we decided that we would need to write most of the code for this new protocol from scratch, instead of hooking into the old and rusty hidden service code. To move this forward, we spent part of the following days splitting the proposal into individual modules and figuring out how to refactor the current data structures so that the new protocol can coexist with the old protocol.
- One open design discussion on proposal 224 has been an earlier suggestion of merging the roles of "hidden service directory" and "introduction point" on the hidden service protocol. This change would improve the security and performance as well as simplify the relevant code, and reduce load on the network. Because it changes the protocol a bit, it would be good to have it specified precisely. For this reason, we spent the second and third days writing a proposal that defines how this change works.
- Another core part of proposal 224 is the protocol for global randomness calculation. That's a system where the Tor network itself generates a fresh, unpredictable random value everyday; basically like the NIST Randomness Beacon but decentralized.
Proposal 225 specifies a way that this can be achieved, but there are still various engineering details that need to be ironed out. We spent some time discussing the various ways we can implement the system and the engineering decisions we should take, and produced a draft Tor proposal that specifies the system.
- We also discussed guard discovery attacks, and the various defenses that we could deploy. The fact that many core Tor people were present helped us decide rapidly which various parameters and trade-offs that we should pick. We sketched a proposal and posted it to the [tor-dev] mailing list and it has already received very helpful feedback.
- We also took our old design for "Direct Onion Services" and revised it into a faster and far more elegant protocol. These types of services trade service-side location privacy for improved performance, reliability, and scalability. They will allow sites like reddit to offer their services faster on hidden services while respecting their clients anonymity. During the last days of the hackfest, we wrote a draft proposal for this new design.
- We did more development on OnioNS, the Onion Name System, which allows a hidden service operator to register a human memorable name (e.g. example.tor) that can be used instead of the regular onion address. In the last days of the hackfest we prepared a proof-of-concept demo wherein a domain name was registered and then the Tor Browser successfully loaded a hidden service under that name. That was a significant step for the project.
- We also discussed hidden service statistics and how the two statistics we implemented a few months ago have been very useful. To improve their reliability (since currently only about 3% of the network reports them), we decided to enable them by default in the future.
We also discussed systems for collecting additional statistics in a privacy-preserving manner, using Secure Multiparty Computation or other similar techniques.
- We talked about rebranding the "Hidden Services" project to "Onion Services" to reduce "hidden"/"dark"/"evil" name connotations, and improve terminology. In fact, we've been on this for a while, but we are still not sure what the right name is. What do you think?
- To improve user education, we explored various concepts for a graphical animation explaining hidden services similar in concept to the Tor animation from a few months ago.
And that's only part of what we did. We also wrote code for various tickets, reviewed even more code and really learned how to use Ricochet.
All in all, we managed to fit more things than we hoped into those few days and we hope to do even more focused hackfests in the near future. Email us if you are interested in hosting a hackfest!
If you'd like to get involved with hidden service development, you can contact the hackfest team. Our nicks on IRC OFTC are armadev, asn, dgoulet, kernelcorn, mrphs, ohmygodel, robgjansen, saint, special, sysrqb, and syverson.
Until next time!
The Tor Project is driven by ideas. We believe in the right to privacy for every person on the planet. Our community—paid and volunteer—brainstorms projects that embody those ideas, like decentralized hidden messaging systems or ingenious new ways to get uncensored Internet access to people in China.
On our public wikis, we make lists of what we need to build these projects—and then we approach potential sponsors with these lists. If we’re lucky, a sponsor will pay to do the project. If not, we may make it for free.
This is true whether the potential sponsor is a government agency or anyone else.
Because of this system, some projects, like hidden services, need more funding, and we are seeking individual contributions to make this technology stronger. One day we hope to build it into many more programs—for instance, phone apps--to make them private and secure by default.
Our diverse, international community includes thousands of men and women inspired by the ideals we share. They work to support Tor and create important tools based on Tor, like Tails and Orbot (there are at least a dozen of these). Our group includes visionaries who think and talk publicly about the Internet and the future of privacy; among them: @nickm_tor, @ioerror and @RogerDingledine. @aaronsw was one of us.
We will accept no back doors to our software, ever. You can watch @ioerror talk about this at last year’s 31c3 talk in Hamburg. We believe in and build free, open source software—free as in freedom. Tor’s source code is online for everyone to see.
We are proud of our people, our work, and our ideals. We are a human rights organization. We are inventors. Our community is a workshop for the future of privacy tools; maybe even for the future of privacy.
The Tor community is open to newcomers; we hope you will join us.
Hidden Services have received a lot of attention in 2015, and Tor is at the center of this conversation. Hidden Services are a Tor technology that allows users to connect to services (blogs, chats, and many other things) with neither the user nor the site giving up identifying information.
In fact, anything you can build on the internet, you can build on hidden services. But they're better--they give users things that normal networking doesn't authentication and confidentiality are built in; anonymity is built in. An internet based on hidden services would be an internet with Tor built in--a feature that users could take for granted. Think of what this might mean to millions of users in countries like China, Iran, or the UK. Yet currently, only about 4% of Tor's traffic comes from hidden services.
So we at Tor have been considering how we might meet the challenge of making them more widely available. In this post, we will briefly discuss the role of hidden services before we explore the idea of using crowdfunding to pay for bold, long-term tech initiatives that will begin to fulfill the promise of this technology.
Hidden Services are a critical part of Tor's ecosystem
Hidden Services provide a means for Tor users to create sites and services that are accessible exclusively within the Tor network, with privacy and security features that make them useful and appealing for a wide variety of applications.
For example, hidden services are currently used by activists and journalists to publish blogs--in anonymity and free from retaliation. They are used by NGOs to securely receive information on government corruption and injustice from concerned citizens. Newspapers such as the Washington Post, and human rights groups such as Amnesty International use them to receive leaked information. They are used by people looking for the latest cat facts, companies that want to secure the path of their clients or by people chatting securely and anonymously -- including at-risk journalists talking to sources.
In addition, developers use hidden services as a building block to incorporate Tor's security and anonymity features into totally separate products. The potential of hidden services is huge, and much of it is yet to be explored.
Next Steps for Hidden Services
We want to make this technology available to the wider public as these services will play a key role in the future of secure communications. This means that we must increase the uses for hidden services, bring them to mobile platforms for anonymous mobile apps, and vastly increase the number of people who use them.
Since our goal is wider use, it is imperative that we build them to be more secure, easier to set up, better performing, and more usable. Clearly, the questions that we answer in early deployment efforts will inform how we answer the deeper questions pertaining to massive worldwide deployment.
We must engage a large number of people to bring hidden services to the next level. Until now, hidden services development largely relied on the volunteer work of developers in their spare time. This will not be sufficient if we are to make the leap to transformative hidden services.
We are currently evaluating funding strategies that will support our Hidden Service initiatives in the short-, intermediate- and long-term. In order to fit the requirements more conservative large funders have, so we can fully sponsor the Next Generation Hidden Services, we must put preliminary pieces in place. And for that we will reach out to crowdfunding. To do this right, we need your feedback.
Crowdfunding allows us to engage the broader community in grasping the opportunity that this new technology promises. We are confident that we can deliver significant advancements in the hidden services field in the short-term, and that many small donors who understand their context will be eager to contribute. We intend to begin by prioritizing the improvement of the security, usability, and performance of the current hidden services system.
Further, we want to make sure we support the efforts of community projects and that the community is participating in shaping the evolution of hidden services. For example, it would be important to assist and improve the Tor integration of projects such as SecureDrop, Pond, Ahmia and Ricochet. We are in the unique position to be able to shape the Tor protocol to make these projects easier to use and better performing, and we would like to identify ways to promote broader deployment of these projects.
Identifying, prioritizing and meeting future challenges will require engagement throughout the greater community. For instance, as changes and enhancements are introduced, we hope to speak with the best bug hunters, cryptographers and privacy experts and ask them to audit our code and designs. Non-technical users could help us evaluate the usability of our improvements.
For this crowdfunding campaign we have identified a few possible ideas-- but the point of this post is to ask you for yours. Here are three projects that we have come up with so far:
An application that Hidden Service operators could use to learn more about the activity of their Hidden Service. The operator would have access to information on user activity, security information, etc., and will receive important system-generated updates, including log messages
A way to set up public hidden services with improved performance but reduced server-side anonymity. Basically, hidden services that don't care about anonymity but still want to protect their clients with Tor's cryptography and anonymity, will be able to run faster since they don't need to protect their own anonymity. This is an optional feature that suits the needs of large sites like Facebook and reddit, and will make their hidden services faster while also reducing the traffic they cause to the network. Also by optimizing for performance in this specialized feature, we can optimize for security even more in the default hidden services configuration.
Tor has been at the center of hidden services from the beginning. We have big lists of changes we need to do to the Tor protocol to increase the security of hidden services against cryptanalysis, DoS and deanonymization attacks. We also want to improve guard security, allow operators to store their cryptographic keys offline and enable scaling of hidden services to new levels. This is a big project but we hope to start crunching through it as part of this crowdfunding campaign.
Your Idea for Hidden Services?
Long story short, we are looking for feedback!
Also, we are curious about which crowdfunding platforms you prefer and why.
In the following weeks, we will update you on our progress, incorporating feedback we receive from the community. We hope to make this process as transparent and public as possible!
EDIT: The "Unhidden Services" paragraph was expanded and changed to "Fast-but-not-hidden Services". The previous name was too scary and the description not sufficient to show the potential of the project. Please send us better names for this feature!
We are starting a project to study and quantify hidden services traffic. As part of this project, we are collecting data from just a few volunteer relays which only allow us to see a small portion of hidden service activity (between 2% and 5%). Extrapolating from such a small sample is difficult, and our data are preliminary.
We've been working on methods to improve our calculations, but with our current methodology, we estimate that about 30,000 hidden services announce themselves to the Tor network every day, using about 5 terabytes of data daily. We also found that hidden service traffic is about 3.4% of total Tor traffic, which means that, at least according to our early calculations, 96.6% of Tor traffic is *not* hidden services. We invite people to join us in working to research methodologies and develop systems for better understanding Tor hidden services.
Over the past months we've been working on hidden service statistics. Our goal has been to answer the following questions:
- "Approximately how many hidden services are there?"
- "Approximately how much traffic of the Tor network is going to hidden services?"
We chose the above two questions because even though we want to understand hidden services, we really don't want to harm the privacy of Tor users. From a privacy perspective, the above two questions are relatively easy questions to answer because we don't need data from clients or the hidden services themselves; we just need data from hidden service directories and rendezvous points. Furthermore, the measurements reported by each relay cannot be linked back to specific hidden services or their clients.
Our first move was to research various ways we could collect these statistics in a privacy-preserving manner. After days of discussions on obfuscating statistics, we began writing a Tor proposal with our design, as well as code that implements the proposal. The code has since been reviewed and merged to Tor! The statistics are currently disabled by default so we asked volunteer relay operators to explicitly turn them on. Currently there are about 70 relays publishing measurements to us every 24 hours:
So as of now we've been receiving these measurements for over a month, and we have thought a lot about how to best use the reported measurements to derive interesting results. We finally have some preliminary results we would like to share with you:
How many hidden services are there?
All in all, it seems that every day about 30000 hidden services announce themselves to the hidden service directories. Graphically:
By counting the number of unique hidden service addresses seen by HSDirs, we can get the approximate number of hidden services. Keep in mind that we can only see between 2% and 5% of the total HSDir space, so the extrapolation is, naturally, messy.
How much traffic do hidden services cause?
Our preliminary results show that hidden services cause somewhere between 400 to 600 Mbit of traffic per second, or equivalently about 4.9 terabytes a day. Here is a graph:
We learned this by getting rendezvous points to publish the total number of cells transferred over rendezvous circuits, which allows us to learn the approximate volume of hidden service traffic. Notice that our coverage here is not very good either, with a probability of about 5% that a hidden service circuit will use a relay that reports these statistics as a rendezvous point.
A related statistic here is "How much of the Tor network is actually hidden service usage?". There are two different ways to answer this question, depending on whether we want to understand what clients are doing or what the network is doing. The fraction of hidden-service traffic at Tor clients differs from the fraction at Tor relays because connections to hidden services use 6-hop circuits while connections to the regular Internet use 3-hop circuits. As a result, the fraction of hidden-service traffic entering or leaving Tor is about half of the fraction of hidden-service traffic inside of Tor. Our conclusion is that about 3.4% of client traffic is hidden-service traffic, and 6.1% of traffic seen at a relay is hidden-service traffic.
Conclusion and future work
In this blog post we presented some preliminary results that could be extracted from these new hidden service statistics. We hope that this data can help us better gauge the future development and maturity of the onion space as well as detect potential incidents and bugs on the network. To better present our results and methods, we wrote a short technical report that outlines the exact process we followed. We invite you to read it if you are curious about the methodology or the results.
Finally, this project is only a few months old, and there are various plans for the future. For example:
There are more interesting questions that we could examine in this area. For example: "How many people are using hidden services every day?" and "How many times does someone try to visit a hidden service that does not exist anymore?."
Unfortunately, some of these questions are not easy to answer with the current statistics reporting infrastructure, mainly because collecting them in this way could reveal information about specific hidden services but also because the results of the current system contain too much obfuscating data (each reporting relay randomizes its numbers a little bit before publishing them, so we can learn about totals but not about specific events).
For this reason, we've been analyzing various statistics aggregation protocols that could be used in place of the current system, allowing us to safely collect other kinds of statistics.
- We need to incorporate these statistics in our Metrics portal so that they are updated regularly and so that everyone can follow them.
Currently, these hidden service statistics are not collected in relays by default. Unfortunately, that gives us very small coverage of the network, which in turn makes our extrapolations very noisy. The main reason that these statistics are disabled by default is that similar statistics are also disabled (e.g. CellStatistics). Also, this allows us more time to consider privacy consequences. As we analyze more of these statistics and think more about statistics privacy, we should decide whether to turn these statistics on by default.
It's worth repeating that the current results are preliminary and should be digested with a grain of salt. We invite statistically-inclined people to review our code, methods, and results. If you are a researcher interested in digging into the measurements themselves, you can find them in the extra-info descriptors of Tor relays.
Over the next months, we will also be thinking more about these problems to figure out proper ways to analyze and safely measure private ecosystems like the onion space.
Till then, take care, and enjoy Tor!
Hi! Nick here.
I ought to post my own responses to that Andy Greenberg article, too. (Especially since most everybody else around here is at 31c3 right now, or sick with the flu, or both.)
When I saw the coverage of the hidden services study that was presented at CCC today, I was reminded of the media fallout from that old study from the 1990s that "proved" that a ridiculously high fraction of the internet was pornography...by looking at Usenet*, and by counting newsgroups and bytes. (You might remember it; it was the basis of the delightful TIME Magazine "Cyberporn" cover.)
The 1990s researcher wasn't lying outright, but he and the press *were* conflating one question: "What fraction of Usenet groups are 'alt.sex' or 'alt.binaries' (file posting) groups" with two others: "What fraction of internet traffic is porn?" and "What fraction of internet-user hours are spent on porn?"
These are quite different things.
The presentation today focused on data about hidden service types and usage. Predictably, given the results from Biryukov, Pustogarov, Thill, and Weinmann, the researcher found that hidden services related to child abuse are only a small fraction of the total number of hidden service addresses on the network. And because of the way that hidden services work, traffic does not go through hidden service directories, but instead through rendezvous points (randomly chosen Tor nodes): so no relay that knows the hidden service's address will learn the actual amount of traffic transmitted. But, as previously documented, abusive services represent a disproportionate fraction of usage... if you're measuring usage with hidden service directory requests.
Why might that be?
First, some background. Basically, a Tor client makes a hidden service directory request the first time it visits a hidden service that it has not been to in a while. (If you spend hours at one hidden service, you make about 1 hidden service directory request. But if you spend 1 second each at 100 hidden services, you make about 100 requests.) Therefore, obsessive users who visit many sites in a session account for many more of the requests that this study measures than users who visit a smaller number of sites with equal frequency.
There are other confounding factors as well. Due to bugs in older Tor implementations, a hidden service that is unreliable (or completely unavailable) will get many, many more hidden service directory requests than a reliable one. So if any abuse sites are unusually unreliable, we'd expect their users to create a disproportionately large number of hidden service directory requests.
Also, a very large number of hidden service directory requests are probably not made by humans! See bug 13287: We don't know what's up with that. Could this be caused by some kind of anti-abuse organization running an automated scanning tool?
In any case, a methodology that looks primarily at hidden service directory requests will over-rate services that are frequently accessed from a Tor client that hasn't been there recently, and under-rate services that are used via tor2web, and so on. It also depends a lot on how hidden services are configured, how frequently Tor hidden service directories go up and down, and what times of day they change introduction points in comparison to what time of day their users tend to be awake.
The greater the number of distinct hidden services a person visits, and the less reliable those sites are, the more hidden service directory requests they will trigger.
Suppose 10 people use hidden services to look at conspiracy theories, 100 people use hidden services to buy Cuban cigars, and 1000 people use it for online chat.
But suppose that the average cigar purchaser visits only one or two sites to make purchases, and the average chat user joins one or two networks, whereas the average conspiracy theorist needs to visit several dozen forums and wikis.
Suppose also that the average Cuban cigar purchaser makes about two purchases a month, the average chat user logs in once a day, and the average conspiracy theorist spends 3 hours a day crawling the hidden web.
And suppose that conspiracy theory websites come and go frequently, whereas cigar sites and chat networks are more stable.
In this analysis, even though there are far more people buying cigars, users who use it for obsessive behavior that spans multiple unreliable hidden services will be far overrepresented in the count of hidden service directory requests than users who use it for activities done less frequently and across fewer services. So any comparison of hidden service directory request counts will say more about the behavioral differences of different types of users than about their relative numbers, or the amount of traffic they generated.
In conclusion, let's spend a minute talking about freedom and philosophy. Any system that provides security on the Internet will inevitably see some use by bad people that we'd rather not help at all. After all, cars are used for getaways, and window shades conceal all kinds of criminality. The only way to make a privacy tool that nobody abuses is to make it so weak that people aren't willing to touch it, or so unusable that nobody can figure it out.
Up till now, many of the early adopters for Tor hidden services have been folks for whom the risk/effort calculations have been quite extreme, since--as I'd certainly acknowledge--the system isn't terribly usable for the average person as it stands. Roger noted earlier that hidden services amount to less than 2% of our total traffic today. Given their privacy potential, I think that's not even close to enough. We've got to work over the next year or more to develop hidden services to the point where their positive impact is felt by the average netizen, whether they're publishing a personal blog for their friends, using a novel communications protocol more secure than email, or reading a news article based on information that a journalist received through an anonymous submission system. Otherwise, they'll remain a target for every kind of speculation, and every misunderstanding about them will lead people to conclude the worst about privacy online. Come lend a hand?
(Also, no offense to Andy on this: he is a fine tech reporter and apparently a fine person. And no offense to Dr. Owen, who explained his results a lot more carefully than they have been re-explained elsewhere. Now please forgive me, I'm off to write some more software and get some sleep. Please direct all media inquiries to the email of "press at torproject dot org".)
* Usenet was sort of like Twitter, only you could write paragraphs on it. ;)
Recently it was announced that a coalition of government agencies took control of many Tor hidden services. We were as surprised as most of you. Unfortunately, we have very little information about how this was accomplished, but we do have some thoughts which we want to share.
Over the last few days, we received and read reports saying that several Tor relays were seized by government officials. We do not know why the systems were seized, nor do we know anything about the methods of investigation which were used. Specifically, there are reports that three systems of Torservers.net disappeared and there is another report by an independent relay operator. If anyone has more details, please get in contact with us. If your relay was seized, please also tell us its identity so that we can request that the directory authorities reject it from the network.
But, more to the point, the recent publications call the targeted hidden services seizures "Operation Onymous" and they say it was coordinated by Europol and other government entities. Early reports say 17 people were arrested, and 400 hidden services were seized. Later reports have clarified that it was hundreds of URLs hosted on roughly 27 web sites offering hidden services. We have not been contacted directly or indirectly by Europol nor any other agency involved.
Tor is most interested in understanding how these services were located, and if this indicates a security weakness in Tor hidden services that could be exploited by criminals or secret police repressing dissents. We are also interested in learning why the authorities seized Tor relays even though their operation was targetting hidden services. Were these two events related?
How did they locate the hidden services?
So we are left asking "How did they locate the hidden services?". We don't know. In liberal democracies, we should expect that when the time comes to prosecute some of the seventeen people who have been arrested, the police would have to explain to the judge how the suspects came to be suspects, and that as a side benefit of the operation of justice, Tor could learn if there are security flaws in hidden services or other critical internet-facing services. We know through recent leaks that the US DEA and others have constructed a system of organized and sanctioned perjury which they refer to as "parallel construction."
Unfortunately, the authorities did not specify how they managed to locate the hidden services. Here are some plausible scenarios:
The first and most obvious explanation is that the operators of these hidden services failed to use adequate operational security. For example, there are reports of one of the websites being infiltrated by undercover agents and the affidavit states various operational security errors.
Another explanation is exploitation of common web bugs like SQL injections or RFIs (remote file inclusions). Many of those websites were likely quickly-coded e-shops with a big attack surface. Exploitable bugs in web applications are a common problem.
Apparently, there are ways to link transactions and deanonymize Bitcoin clients even if they use Tor. Maybe the seized hidden services were running Bitcoin clients themselves and were victims of similar attacks.
Attacks on the Tor network
The number of takedowns and the fact that Tor relays were seized could also mean that the Tor network was attacked to reveal the location of those hidden services. We received some interesting information from an operator of a now-seized hidden service which may indicate this, as well. Over the past few years, researchers have discovered various attacks on the Tor network. We've implemented some defenses against these attacks, but these defenses do not solve all known issues and there may even be attacks unknown to us.
For example, some months ago, someone was launching non-targetted deanonymization attacks on the live Tor network. People suspect that those attacks were carried out by CERT researchers. While the bug was fixed and the fix quickly deployed in the network, it's possible that as part of their attack, they managed to deanonymize some of those hidden services.
Another possible Tor attack vector could be the Guard Discovery attack. This attack doesn't reveal the identity of the hidden service, but allows an attacker to discover the guard node of a specific hidden service. The guard node is the only node in the whole network that knows the actual IP address of the hidden service. Hence, if the attacker then manages to compromise the guard node or somehow obtain access to it, she can launch a traffic confirmation attack to learn the identity of the hidden service. We've been
discussing various solutions to the guard discovery attack for the past many months but it's not an easy problem to fix properly. Help and feedback on the proposed designs is appreciated.
*Similarly, there exists the attack where the hidden service selects the attacker's relay as its guard node. This may happen randomly or this could occur if the hidden service selects another relay as its guard and the attacker renders that node unusable, by a denial of service attack or similar. The hidden service will then be forced to select a new guard. Eventually, the hidden service will select the attacker.
Furthermore, denial of service attacks on relays or clients in the Tor network can often be leveraged into full de-anonymization attacks. These techniques go back many years, in research such as "From a Trickle to a Flood", "Denial of Service or Denial of Security?", "Why I'm not an Entropist", and even the more recent Bitcoin attacks above. In the Hidden Service protocol there are more vectors for DoS attacks, such as the set of HSDirs and the Introduction Points of a Hidden Service.
Finally, remote code execution exploits against Tor software are also always a possibility, but we have zero evidence that such exploits exist. Although the Tor source code gets continuously reviewed by our security-minded developers and community members, we would like more focused auditing by experienced bug hunters. Public-interest initiatives like Project Zero could help out a lot here. Funding to launch a bug bounty program of our own could also bring real benefit to our codebase. If you can help, please get in touch.
Advice to concerned hidden service operators
As you can see, we still don't know what happened, and it's hard to give concrete suggestions blindly.
If you are a concerned hidden service operator, we suggest you read the cited resources to get a better understanding of the security that hidden services can offer and of the limitations of the current system. When it comes to anonymity, it's clear that the tighter your threat model is, the more informed you need to be about the technologies you use.
If your hidden service lacks sufficient processor, memory, or network resources the DoS based de-anonymization attacks may be easy to leverage against your service. Be sure to review the Tor performance tuning guide to optimize your relay or client.
*Another possible suggestion we can provide is manually selecting the guard node of a hidden service. By configuring the EntryNodes option in Tor's configuration file you can select a relay in the Tor network you trust. Keep in mind, however, that a determined attacker will still be able to determine this relay is your guard and all other attacks still apply.
The task of hiding the location of low-latency web services is a very hard problem and we still don't know how to do it correctly. It seems that there are various issues that none of the current anonymous publishing designs have really solved.
In a way, it's even surprising that hidden services have survived so far. The attention they have received is minimal compared to their social value and compared to the size and determination of their adversaries.
It would be great if there were more people reviewing our designs and code. For example, we would really appreciate feedback on the upcoming hidden service revamp or help with the research on guard discovery attacks (see links above).
Also, it's important to note that Tor currently doesn't have funding for improving the security of hidden services. If you are interested in funding hidden services research and development, please get in touch with us. We hope to find time to organize a crowdfunding campaign to acquire independent and focused hidden service funding.
Thanks to Griffin, Matt, Adam, Roger, David, George, Karen, and Jake for contributions to this post.
* Added information about guard node DoS and EntryNodes option - 2014/11/09 18:16 UTC
Today Facebook unveiled its hidden service that lets users access their website more safely. Users and journalists have been asking for our response; here are some points to help you understand our thinking.
Part one: yes, visiting Facebook over Tor is not a contradiction
I didn't even realize I should include this section, until I heard from a journalist today who hoped to get a quote from me about why Tor users wouldn't ever use Facebook. Putting aside the (still very important) questions of Facebook's privacy habits, their harmful real-name policies, and whether you should or shouldn't tell them anything about you, the key point here is that anonymity isn't just about hiding from your destination.
There's no reason to let your ISP know when or whether you're visiting Facebook. There's no reason for Facebook's upstream ISP, or some agency that surveils the Internet, to learn when and whether you use Facebook. And if you do choose to tell Facebook something about you, there's still no reason to let them automatically discover what city you're in today while you do it.
Also, we should remember that there are some places in the world that can't reach Facebook. Long ago I talked to a Facebook security person who told me a fun story. When he first learned about Tor, he hated and feared it because it "clearly" intended to undermine their business model of learning everything about all their users. Then suddenly Iran blocked Facebook, a good chunk of the Persian Facebook population switched over to reaching Facebook via Tor, and he became a huge Tor fan because otherwise those users would have been cut off. Other countries like China followed a similar pattern after that. This switch in his mind between "Tor as a privacy tool to let users control their own data" to "Tor as a communications tool to give users freedom to choose what sites they visit" is a great example of the diversity of uses for Tor: whatever it is you think Tor is for, I guarantee there's a person out there who uses it for something you haven't considered.
Part two: we're happy to see broader adoption of hidden services
I think it is great for Tor that Facebook has added a .onion address. There are some compelling use cases for hidden services: see for example the ones described at using Tor hidden services for good, as well as upcoming decentralized chat tools like Ricochet where every user is a hidden service, so there's no central point to tap or lean on to retain data. But we haven't really publicized these examples much, especially compared to the publicity that the "I have a website that the man wants to shut down" examples have gotten in recent years.
Hidden services provide a variety of useful security properties. First — and the one that most people think of — because the design uses Tor circuits, it's hard to discover where the service is located in the world. But second, because the address of the service is the hash of its key, they are self-authenticating: if you type in a given .onion address, your Tor client guarantees that it really is talking to the service that knows the private key that corresponds to the address. A third nice feature is that the rendezvous process provides end-to-end encryption, even when the application-level traffic is unencrypted.
So I am excited that this move by Facebook will help to continue opening people's minds about why they might want to offer a hidden service, and help other people think of further novel uses for hidden services.
Another really nice implication here is that Facebook is committing to taking its Tor users seriously. Hundreds of thousands of people have been successfully using Facebook over Tor for years, but in today's era of services like Wikipedia choosing not to accept contributions from users who care about privacy, it is refreshing and heartening to see a large website decide that it's ok for their users to want more safety.
As an addendum to that optimism, I would be really sad if Facebook added a hidden service, had a few problems with trolls, and decided that they should prevent Tor users from using their old https://www.facebook.com/ address. So we should be vigilant in helping Facebook continue to allow Tor users to reach them through either address.
Part three: their vanity address doesn't mean the world has ended
Their hidden service name is "facebookcorewwwi.onion". For a hash of a public key, that sure doesn't look random. Many people have been wondering how they brute forced the entire name.
The short answer is that for the first half of it ("facebook"), which is only 40 bits, they generated keys over and over until they got some keys whose first 40 bits of the hash matched the string they wanted.
Then they had some keys whose name started with "facebook", and they looked at the second half of each of them to pick out the ones with pronouncable and thus memorable syllables. The "corewwwi" one looked best to them — meaning they could come up with a story about why that's a reasonable name for Facebook to use — so they went with it.
So to be clear, they would not be able to produce exactly this name again if they wanted to. They could produce other hashes that start with "facebook" and end with pronouncable syllables, but that's not brute forcing all of the hidden service name (all 80 bits).
For those who want to explore the math more, read about the "birthday attack". And for those who want to learn more (please help!) about the improvements we'd like to make for hidden services, including stronger keys and stronger names, see hidden services need some love and Tor proposal 224.
Part four: what do we think about an https cert for a .onion address?
Facebook didn't just set up a hidden service. They also got an https certificate for their hidden service, and it's signed by Digicert so your browser will accept it. This choice has produced some feisty discussions in the CA/Browser community, which decides what kinds of names can get official certificates. That discussion is still ongoing, but here are my early thoughts on it.
In favor: we, the Internet security community, have taught people that https is necessary and http is scary. So it makes sense that users want to see the string "https" in front of them.
Against: Tor's .onion handshake basically gives you all of that for free, so by encouraging people to pay Digicert we're reinforcing the CA business model when maybe we should be continuing to demonstrate an alternative.
In favor: Actually https does give you a little bit more, in the case where the service (Facebook's webserver farm) isn't in the same location as the Tor program. Remember that there's no requirement for the webserver and the Tor process to be on the same machine, and in a complicated set-up like Facebook's they probably shouldn't be. One could argue that this last mile is inside their corporate network, so who cares if it's unencrypted, but I think the simple phrase "ssl added and removed here" will kill that argument.
Against: if one site gets a cert, it will further reinforce to users that it's "needed", and then the users will start asking other sites why they don't have one. I worry about starting a trend where you need to pay Digicert money to have a hidden service or your users think it's sketchy — especially since hidden services that value their anonymity could have a hard time getting a certificate.
One alternative would be to teach Tor Browser that https .onion addresses don't deserve a scary pop-up warning. A more thorough approach in that direction is to have a way for a hidden service to generate its own signed https cert using its onion private key, and teach Tor Browser how to verify them — basically a decentralized CA for .onion addresses, since they are self-authenticating anyway. Then you don't have to go through the nonsense of pretending to see if they could read email at the domain, and generally furthering the current CA model.
We could also imagine a pet name model where the user can tell her Tor Browser that this .onion address "is" Facebook. Or the more direct approach would be to ship a bookmark list of "known" hidden services in Tor Browser — like being our own CA, using the old-fashioned /etc/hosts model. That approach would raise the political question though of which sites we should endorse in this way.
So I haven't made up my mind yet about which direction I think this discussion should go. I'm sympathetic to "we've taught the users to check for https, so let's not confuse them", but I also worry about the slippery slope where getting a cert becomes a required step to having a reputable service. Let us know if you have other compelling arguments for or against.
Part five: what remains to be done?
In terms of both design and security, hidden services still need some love. We have plans for improved designs (see Tor proposal 224) but we don't have enough funding and developers to make it happen. We've been talking to some Facebook engineers this week about hidden service reliability and scalability, and we're excited that Facebook is thinking of putting development effort into helping improve hidden services.
And finally, speaking of teaching people about the security features of .onion sites, I wonder if "hidden services" is no longer the best phrase here. Originally we called them "location-hidden services", which was quickly shortened in practice to just "hidden services". But protecting the location of the service is just one of the security features you get. Maybe we should hold a contest to come up with a new name for these protected services? Even something like "onion services" might be better if it forces people to learn what it is.
The Google Summer of Code (GSoC) was an excellent opportunity to improve on the Ahmia search engine. With Google's stipend and friendly mentoring from The Tor Project, I was able to concentrate on development of my search engine project. Thank you all!
GSoC 2014 is over, but I am sticking around to continue developing and maintaining Ahmia.
Here is the current status of ahmia after GSoC development:
Building a search engine for anonymous web sites running inside the Tor network is an interesting problem. Tor enables web servers to hide their location and Tor users can connect to these authenticated hidden services while the server and the user both stay anonymous. However, finding web content is hard without a good search engine and therefore a search engine is needed for the Tor network.
Web search engines are needed to navigate and search the web. There were no search engines for searching hidden service web content, so I decided to build a search engine specially for Tor. I registered ahmia.fi and started development on it as a side project in 2010.
This development involved programming and testing web crawlers, thinking of ways to find hidden service addresses (since the protocol does not allow enumeration), learning about the Tor community, and implementing a filtering policy. Moreover, I implemented an API that empowers other Tor services that publish content to integrate with Ahmia.
As a result, Ahmia is a working search engine that indexes, searches and catalogs content published on Tor Hidden Services. Furthermore, it is an environment to share meaningful statistics, insights and news about the Tor network itself.
Interesting Summer of Code
One of my best memories from the summer is the Tor Project's Summer 2014 Developers meeting that was hosted by Mozilla in Paris, France. I have always admired the people who are working on the Tor Project.
I also loved the coding itself. Finally I had time to improve the Ahmia search engine and its many features. I did a lot of work and liked it.
Some journalist were very interested in my work: Carola Frediani asked if I could analyze the content of hidden services. I coded a script that fetches every front page's HTML, I gathered all the keywords, headers and description texts and made a simple word cloud visualization.
It is a simple way to glance what is published on the hidden websites.
Carola found this data useful and used it in her presentation at www.sotn.it on June 11th.
Technical design of ahmia
The components of Ahmia are:
- Django front-end site
- PostgreSQL database for the site
- Custom scripts to download data about hidden services
- Django-Haystack connection to Solr database
- Apache Solr for the crawled data
- OnionBot crawler that gathers data to Solr database
The full-text search is implemented using Django-Haystack. The search is using crawled website data that is saved to Apache Solr.
OnionDir is a list of known online hidden service addresses. A separate script gathers this list and fetches information fields from the HTML (title, keywords, description etc.). Furthermore, users can freely edit these fields.
We've also started a convention where hidden service admins can add a file to their website, called description.json, to offer an official description of their site in Ahmia.
As a result, this information is shown in the OnionDir page and over 80 domains are already using this method.
We are gathering three types of popularity data:
- Tor2web nodes share their visiting statistics to Ahmia
- Number of public WWW backlinks to hidden services
- Number of clicks in the search results
The click counter tells the total number of clicks on a search result in ahmia.fi
We have decided to filter any sites related to child porn from our search results. Ahmia is removing everything related to these websites. These websites may not be actual child porn sites. They are rather sites where users can post content (forums, file and image uploads etc.) and as the result there have been, momentarily at least, some suspicious content that has not been moderated in a reasonable period of time. Ahmia.fi does not have the time to monitor these sites carefully and we are banning sites from our public index if we see any evidence of child abuse. Of course, the ban is removed if the site itself contacts us and we review the website to be OK.
In practice, Ahmia calculates the MD5 sums of the banned domains for use as a filtering policy. Moreover, we are sharing this list and Tor2web nodes can use the list to filter out pages.
At the moment, there seems to be 1228 hidden website domains online and 7 of them has been filtered because they are possibly sharing child porn content.
OnionBot is a crawler for hidden service websites based on the Scrapy framework. It crawls the Tor network and passes data to the search database. OnionBot requires the Tor software (using Tor2web mode) and Polipo. The results are saved to Apache Solr.
Apache Solr is a popular, open source enterprise search platform. Its major features include powerful full-text search, hit highlighting, faceted search, and near real-time indexing.
The schema.xml file contains all of the details about which fields your documents can contain, and how those fields should be dealt with when adding documents to the index, or when querying those fields.
Security measures for privacy
In the software
- We do not log any IP addresses, see Apache configuration
- We are gathering real-time clicks, however, this data is not shown accurately
In the host ahmia.fi
- Backend servers are run separately and they do not have any knowledge about the end-users
- All servers are hosted in countries with strong privacy laws. For example, Finland and the Netherlands
- Communication between servers is encrypted
- Only a few trustworthy people know the locations of the back-end servers and are able to access them
GSoC 2014 was fun and productive!
There is a lot more to do. However, I do not have time to do everything myself. Of course, I am coding when I have time and maintaining the search engine.
In addition, I am going to write a scientific article about the implementation.
Is there anyone who would be interested in developing Ahmia.fi?
Is anyone familiar with Solr and would know how to tweak it for full text search?
Furthermore, any kind of help would be most welcome. There are always Linux admin duties, HTML/CSS design, bug fixing, Django development, etc...
For further information, please don't hesitate to contact me by e-mail: firstname.lastname@example.org
This advisory was posted on the tor-announce mailing list.
On July 4 2014 we found a group of relays that we assume were trying to deanonymize users. They appear to have been targeting people who operate or access Tor hidden services. The attack involved modifying Tor protocol headers to do traffic confirmation attacks.
The attacking relays joined the network on January 30 2014, and we removed them from the network on July 4. While we don't know when they started doing the attack, users who operated or accessed hidden services from early February through July 4 should assume they were affected.
Unfortunately, it's still unclear what "affected" includes. We know the attack looked for users who fetched hidden service descriptors, but the attackers likely were not able to see any application-level traffic (e.g. what pages were loaded or even whether users visited the hidden service they looked up). The attack probably also tried to learn who published hidden service descriptors, which would allow the attackers to learn the location of that hidden service. In theory the attack could also be used to link users to their destinations on normal Tor circuits too, but we found no evidence that the attackers operated any exit relays, making this attack less likely. And finally, we don't know how much data the attackers kept, and due to the way the attack was deployed (more details below), their protocol header modifications might have aided other attackers in deanonymizing users too.
Relays should upgrade to a recent Tor release (0.2.4.23 or 0.2.5.6-alpha), to close the particular protocol vulnerability the attackers used — but remember that preventing traffic confirmation in general remains an open research problem. Clients that upgrade (once new Tor Browser releases are ready) will take another step towards limiting the number of entry guards that are in a position to see their traffic, thus reducing the damage from future attacks like this one. Hidden service operators should consider changing the location of their hidden service.
THE TECHNICAL DETAILS:
We believe they used a combination of two classes of attacks: a traffic confirmation attack and a Sybil attack.
A traffic confirmation attack is possible when the attacker controls or observes the relays on both ends of a Tor circuit and then compares traffic timing, volume, or other characteristics to conclude that the two relays are indeed on the same circuit. If the first relay in the circuit (called the "entry guard") knows the IP address of the user, and the last relay in the circuit knows the resource or destination she is accessing, then together they can deanonymize her. You can read more about traffic confirmation attacks, including pointers to many research papers, at this blog post from 2009:
The particular confirmation attack they used was an active attack where the relay on one end injects a signal into the Tor protocol headers, and then the relay on the other end reads the signal. These attacking relays were stable enough to get the HSDir ("suitable for hidden service directory") and Guard ("suitable for being an entry guard") consensus flags. Then they injected the signal whenever they were used as a hidden service directory, and looked for an injected signal whenever they were used as an entry guard.
The way they injected the signal was by sending sequences of "relay" vs "relay early" commands down the circuit, to encode the message they want to send. For background, Tor has two types of cells: link cells, which are intended for the adjacent relay in the circuit, and relay cells, which are passed to the other end of the circuit. In 2008 we added a new kind of relay cell, called a "relay early" cell, which is used to prevent people from building very long paths in the Tor network. (Very long paths can be used to induce congestion and aid in breaking anonymity). But the fix for infinite-length paths introduced a problem with accessing hidden services, and one of the side effects of our fix for bug 1038 was that while we limit the number of outbound (away from the client) "relay early" cells on a circuit, we don't limit the number of inbound (towards the client) relay early cells.
So in summary, when Tor clients contacted an attacking relay in its role as a Hidden Service Directory to publish or retrieve a hidden service descriptor (steps 2 and 3 on the hidden service protocol diagrams), that relay would send the hidden service name (encoded as a pattern of relay and relay-early cells) back down the circuit. Other attacking relays, when they get chosen for the first hop of a circuit, would look for inbound relay-early cells (since nobody else sends them) and would thus learn which clients requested information about a hidden service.
There are three important points about this attack:
A) The attacker encoded the name of the hidden service in the injected signal (as opposed to, say, sending a random number and keeping a local list mapping random number to hidden service name). The encoded signal is encrypted as it is sent over the TLS channel between relays. However, this signal would be easy to read and interpret by anybody who runs a relay and receives the encoded traffic. And we might also worry about a global adversary (e.g. a large intelligence agency) that records Internet traffic at the entry guards and then tries to break Tor's link encryption. The way this attack was performed weakens Tor's anonymity against these other potential attackers too — either while it was happening or after the fact if they have traffic logs. So if the attack was a research project (i.e. not intentionally malicious), it was deployed in an irresponsible way because it puts users at risk indefinitely into the future.
(This concern is in addition to the general issue that it's probably unwise from a legal perspective for researchers to attack real users by modifying their traffic on one end and wiretapping it on the other. Tools like Shadow are great for testing Tor research ideas out in the lab.)
B) This protocol header signal injection attack is actually pretty neat from a research perspective, in that it's a bit different from previous tagging attacks which targeted the application-level payload. Previous tagging attacks modified the payload at the entry guard, and then looked for a modified payload at the exit relay (which can see the decrypted payload). Those attacks don't work in the other direction (from the exit relay back towards the client), because the payload is still encrypted at the entry guard. But because this new approach modifies ("tags") the cell headers rather than the payload, every relay in the path can see the tag.
C) We should remind readers that while this particular variant of the traffic confirmation attack allows high-confidence and efficient correlation, the general class of passive (statistical) traffic confirmation attacks remains unsolved and would likely have worked just fine here. So the good news is traffic confirmation attacks aren't new or surprising, but the bad news is that they still work. See https://blog.torproject.org/blog/one-cell-enough for more discussion.
Then the second class of attack they used, in conjunction with their traffic confirmation attack, was a standard Sybil attack — they signed up around 115 fast non-exit relays, all running on 22.214.171.124/16 or 126.96.36.199/16. Together these relays summed to about 6.4% of the Guard capacity in the network. Then, in part because of our current guard rotation parameters, these relays became entry guards for a significant chunk of users over their five months of operation.
We actually noticed these relays when they joined the network, since the DocTor scanner reported them. We considered the set of new relays at the time, and made a decision that it wasn't that large a fraction of the network. It's clear there's room for improvement in terms of how to let the Tor network grow while also ensuring we maintain social connections with the operators of all large groups of relays. (In general having a widely diverse set of relay locations and relay operators, yet not allowing any bad relays in, seems like a hard problem; on the other hand our detection scripts did notice them in this case, so there's hope for a better solution here.)
In response, we've taken the following short-term steps:
1) Removed the attacking relays from the network.
2) Put out a software update for relays to prevent "relay early" cells from being used this way.
3) Put out a software update that will (once enough clients have upgraded) let us tell clients to move to using one entry guard rather than three, to reduce exposure to relays over time.
4) Clients can tell whether they've received a relay or relay-cell. For expert users, the new Tor version warns you in your logs if a relay on your path injects any relay-early cells: look for the phrase "Received an inbound RELAY_EARLY cell".
The following longer-term research areas remain:
5) Further growing the Tor network and diversity of relay operators, which will reduce the impact from an adversary of a given size.
6) Exploring better mechanisms, e.g. social connections, to limit the impact from a malicious set of relays. We've also formed a group to pay more attention to suspicious relays in the network:
7) Further reducing exposure to guards over time, perhaps by extending the guard rotation lifetime:
8) Better understanding statistical traffic correlation attacks and whether padding or other approaches can mitigate them.
9) Improving the hidden service design, including making it harder for relays serving as hidden service directory points to learn what hidden service address they're handling:
Q1) Was this the Black Hat 2014 talk that got canceled recently?
Q2) Did we find all the malicious relays?
Q3) Did the malicious relays inject the signal at any points besides the HSDir position?
Q4) What data did the attackers keep, and are they going to destroy it? How have they protected the data (if any) while storing it?
Great questions. We spent several months trying to extract information from the researchers who were going to give the Black Hat talk, and eventually we did get some hints from them about how "relay early" cells could be used for traffic confirmation attacks, which is how we started looking for the attacks in the wild. They haven't answered our emails lately, so we don't know for sure, but it seems likely that the answer to Q1 is "yes". In fact, we hope they *were* the ones doing the attacks, since otherwise it means somebody else was. We don't yet know the answers to Q2, Q3, or Q4.
New work on denial of service in Tor will be presented at NDSS '14 on Tuesday, February 25th, 2014:
The Sniper Attack: Anonymously Deanonymizing and Disabling the Tor Network
by Rob Jansen, Florian Tschorsch, Aaron Johnson, and Björn Scheuermann
To appear at the 21st Symposium on Network and Distributed System Security
We found a new vulnerability in the design of Tor's flow control algorithm that can be exploited to remotely crash Tor relays. The attack is an extremely low resource attack in which an adversary's bandwidth may be traded for a target relay's memory (RAM) at an amplification rate of one to two orders of magnitude. Ironically, the adversary can use Tor to protect it's identity while attacking Tor without significantly reducing the effectiveness of the attack.
We studied relay availability under the attack using Shadow, a discrete-event network simulator that runs the real Tor software in a safe, private testing environment, and found that we could disable each of the fastest guard and the fastest exit relay in a range of 1-18 minutes (depending on relay RAM capacity). We also found that the entire group of the top 20 exit relays, representing roughly 35% of Tor bandwidth capacity at the time of the analysis, could be disabled in a range of 29 minutes to 3 hours and 50 minutes. We also analyzed how the attack could potentially be used to deanonymize hidden services, and found that it would take between 4 and 278 hours before the attack would succeed (again depending on relay RAM capacity, as well as the bandwidth resources used to launch the attack).
Due to our devastating findings, we also designed three defenses that mitigate our attacks, one of which provably renders the attack ineffective. Defenses have been implemented and deployed into the Tor software to ensure that the Tor network is no longer vulnerable as of Tor version 0.2.4.18-rc and later. Some of that work can be found in Trac tickets #9063, #9072, #9093, and #10169.
In the remainder of this post I will detail the attacks and defenses we analyzed, noting again that this information is presented more completely (and more elegantly) in our paper.
The Tor Network Infrastructure
The Tor network is a distributed system made up of thousands of computers running the Tor software that contribute their bandwidth, memory, and computational resources for the greater good. These machines are called Tor relays, because their main task is to forward or relay network traffic to another entity after performing some cryptographic operations. When a Tor user wants to download some data using Tor, the user's Tor client software will choose three relays from those available (an entry, middle, and exit), form a path or circuit between these relays, and then instruct the third relay (the exit) to fetch the data and send it back through the circuit. The data will get transferred from its source to the exit, from the exit to the middle, and from the middle to the entry before finally making its way to the client.
The client may request the exit to fetch large amounts of data, and so Tor uses a window-based flow control scheme in order to limit the amount of data each relay needs to buffer in memory at once. When a circuit is created, the exit will initialize its circuit package counter to 1000 cells, indicating that it is willing to send 1000 cells into the circuit. The exit decrements the package counter by one for every data cell it sends into the circuit (to the middle relay), and stops sending data when the package counter reaches 0. The client at the other end of the circuit keeps a delivery counter, and initializes it to 0 upon circuit creation. The client increments the delivery counter by 1 for every data cell it receives on that circuit. When the client's delivery counter reaches 100, it sends a special Tor control cell, called a SENDME cell, to the exit to signal that it received 100 cells. Upon receiving the SENDME, the exit adds 100 to its package counter and continues sending data into the circuit.
This flow control scheme limits the amount of outstanding data that may be in flight at any time (between the exit and the client) to 1000 cells, or about 500 KiB, per circuit. The same mechanism is used when data is flowing in the opposite direction (up from the client, through the entry and middle, and to the exit).
The Sniper Attack
The new Denial of Service (DoS) attack, which we call "The Sniper Attack", exploits the flow control algorithm to remotely crash a victim Tor relay by depleting its memory resources. The paper presents three attacks that rely on the following two techniques:
- the attacker stops reading from the TCP connection containing the attack circuit, which causes the TCP window on the victim's outgoing connection to close and the victim to buffer up to 1000 cells; and
- the attacker causes cells to be continuously sent to the victim (exceeding the 1000 cell limit and consuming the victim's memory resources) either by ignoring the package window at packaging end of the circuit, or by continuously sending SENDMEs from the delivery end to the packaging end even though no cells have been read by the delivery end.
Basic Version 1 (attacking an entry relay)
In basic version 1, the adversary controls the client and the exit relay, and chooses a victim for the entry relay position. The adversary builds a circuit through the victim to her own exit, and then the exit continuously generates and sends arbitrary data through the circuit toward the client while ignoring the package window limit. The client stops reading from the TCP connection to the entry relay, and the entry relay buffers all data being sent by the exit relay until it is killed by its OS out-of-memory killer.
Basic Version 2 (attacking an exit relay)
In basic version 2, the adversary controls the client and an Internet destination server (e.g. website), and chooses a victim for the exit relay position. The adversary builds a circuit through the victim exit relay, and then the client continuously generates and sends arbitrary data through the circuit toward the exit relay while ignoring the package window limit. The destination server stops reading from the TCP connection to the exit relay, and the exit relay buffers all data being sent by the client until it is killed by its OS out-of-memory killer.
Both of the basic versions of the attack above require the adversary to generate and send data, consuming roughly the same amount of upstream bandwidth as the victim's available memory. The efficient version reduces this cost by one to two orders of magnitude.
In the efficient version, the adversary controls only a client. She creates a circuit, choosing the victim for the entry position, and then instructs the exit relay to download a large file from some external Internet server. The client stops reading on the TCP connection to the entry relay, causing it to buffer 1000 cells.
At this point, the adversary may "trick" the exit relay into sending more cells by sending it a SENDME cell, even though the client has not actually received any cells from the entry. As long as this SENDME does not increase the exit relay's package counter to greater than 1000 cells, the exit relay will continue to package data from the server and send it into the circuit toward the victim. If the SENDME does cause the exit relay's package window to exceed the 1000 cell limit, it will stop responding on that circuit. However, the entry and middle node will hold the circuit open until the client issues another command, meaning its resources will not be freed.
The bandwidth cost of the attack after circuit creation is simply the bandwidth cost of occasionally sending a SENDME to the exit. The memory consumption speed depends on the bandwidth and congestion of non-victim circuit relays. We describe how to parallelize the attack using multiple circuits and multiple paths with diverse relays in order to draw upon Tor's inherent resources. We found that with roughly 50 KiB/s of upstream bandwidth, an attacker could consume the victim's memory at roughly 1 MiB/s. This is highly dependent on the victim's bandwidth capabilities: relays that use token buckets to restrict bandwidth usage will of course bound the attack's consumption rate.
Rather than connecting directly to the victim, the adversary may instead launch the attack through a separate Tor circuit using a second client instance and the "Socks4Proxy" or "Socks5Proxy" option. In this case, she may benefit from the anonymity that Tor itself provides in order to evade detection. We found that there is not a significant increase in bandwidth usage when anonymizing the attack in this way.
A simple but naive defense against the Sniper Attack is to have the guard node watch its queue length, and if it ever fills to over 1000 cells, kill the circuit. This defense does not prevent the adversary from parallelizing the attack by using multiple circuits (and then consuming 1000 cells on each), which we have shown to be extremely effective.
Another defense, called "authenticated SENDMEs", tries to protect against receiving a SENDME from a node that didn't actually receive 100 cells. In this approach, a 1 byte nonce is placed in every 100th cell by the packaging end, and that nonce must be included by the delivery end in the SENDME (otherwise the packaging end rejects the SENDME as inauthentic). As above, this does not protect against the parallel attack. It also doesn't defend against either of the basic attacks where the adversary controls the packaging end and ignores the SENDMEs anyway.
The best defense, as we suggested to the Tor developers, is to implement a custom, adaptive out-of-memory circuit killer in application space (i.e. inside Tor). The circuit killer is only activated when memory becomes scarce, and then it chooses the circuit with the oldest front-most cell in its circuit queue. This will prevent the Sniper Attack by killing off all of the attack circuits.
With this new defense in place, the next game is for the adversary to try to cause Tor to kill an honest circuit. In order for an adversary to cause an honest circuit to get killed, it must ensure that the front-most cell on its malicious circuit queue is at least slightly "younger" than the oldest cell on any honest queue. We show that the Sniper Attack is impractical with this defense: due to fairness mechanisms in Tor, the adversary must spend an extraordinary amount of bandwidth keeping its cells young — bandwidth that would likely be better served in a more traditional brute-force DoS attack.
Tor has implemented a version of the out-of-memory killer for circuits, and is currently working on expanding this to channel and connection buffers as well.
Hidden Service Attack and Countermeasures
The paper also shows how the Sniper Attack can be used to deanonymize hidden services:
- run a malicious entry guard relay;
- run the attack from Oakland 2013 to learn the current guard relay of the target hidden service;
- run the Sniper Attack on the guard from step 2, knocking it offline and causing the hidden service to choose a new guard;
- repeat, until the hidden service chooses the relay from step 1 as its new entry guard.
The technique to verify that the hidden service is using a malicious guard in step 4 is the same technique used in step 2.
In the paper, we compute the expected time to succeed in this attack while running malicious relays of various capacities. It takes longer to succeed against relays that have more RAM, since it relies on the Sniper Attack to consume enough RAM to kill the relay (which itself depends on the bandwidth capacity of the victim relay). For the malicious relay bandwidth capacities and honest relay RAM amounts used in their estimate, we found that deanonymization would involve between 18 and 132 Sniper Attacks and take between ~4 and ~278 hours.
This attack becomes much more difficult if the relay is rebooted soon after it crashes, and the attack is ineffective when Tor relays are properly defending against the Sniper Attack (see the "Defenses" section above).
Strategies to defend hidden services in particular go beyond those suggested here to include entry guard rate-limiting, where you stop building circuits if you notice that your new guards keep going down (failing closed), and middle guards, guard nodes for your guard nodes. Both of these strategies attempt to make it harder to coerce the hidden service into building new circuits or exposing itself to new relays, since that is precisely what is needed for deanonymization.
The main defense implemented in Tor will start killing circuits when memory gets low. Currently, Tor uses a configuration option (MaxMemInCellQueues) that allows a relay operator to configure when the circuit-killer should be activated. There is likely not one single value that makes sense here: if it is too high, then relays with lower memory will not be protected; if it is too low, then there may be more false positives resulting in honest circuits being killed. Can Tor determine this setting in an OS-independent way that allows relays to automatically find the right value for MaxMemInCellQueues?
The defenses against the Sniper Attack prevent the adversary from crashing the victim relay, but the adversary may still consume a relay's bandwidth (and memory resources, to a critical level) at relatively low cost. This means that even though the Sniper Attack can no longer kill a relay, it can still consume a large amount of its bandwidth at a relatively low cost (similar to more traditional bandwidth amplification attacks). More analysis of general bandwidth consumption attacks and defenses remains a useful research problem.
Finally, hidden services also need some love. More work is needed to redesign them in a way that does not allow a client to cause the hidden service to choose new relays on demand.