Research problem: better guard rotation parameters

by arma | August 20, 2011

Tor's entry guard design protects users in a variety of ways.

First, they protect against the "predecessor attack": if you choose new relays for each circuit, eventually an attacker who runs a few relays will be your first and last hop. With entry guards, the risk of end-to-end correlation for any given circuit is the same, but the cumulative risk for all your circuits over time is capped.

Second, they help to protect against the "denial of service as denial of anonymity" attack, where an attacker who runs quite a few relays fails any circuit that he's a part of and that he can't win against, forcing Alice (the user) to generate more circuits and thus increasing the overall chance that the attacker wins. Entry guards greatly reduce the risk, since Alice will never choose outside of a few nodes for her first hop.

Third, entry guards raise the startup cost to an adversary who runs relays in order to trace users. Without entry guards, the attacker can sign up some relays and immediately start having chances to observe Alice's circuits. With them, new adversarial relays won't have the Guard flag so won't be chosen as the first hop of any circuit; and even once they earn the Guard flag, users who have already chosen guards won't switch away from their current guards for quite a while.

But how long exactly? The first research question here examines vulnerability due to natural churn of entry guards. Consider an adversary who runs one entry guard with advertised capacity C. Alice maintains an ordered list of guards (chosen at random, weighted by advertised speed, from the whole list of possible guards). For each circuit, she chooses her first hop (uniformly at random) from the first G guards from her list that are currently running (appending a new guard to the list as needed, but going back to earlier guards if they reappear). How long until Alice adds the adversary's node to her guard list? You can use the uptime data from the directory archives, either to build a model for guard churn or just to use the churn data directly. Assuming G=3, how do the results vary by C? What's the variance?

Research question two: consider intentional churn due to load balancing too. Alice actually discards each guard between 30 and 60 days (uniformly chosen) after she first picks it. This intentional turnover prevents long-running guards from accumulating more and more users and getting overloaded. Another way of looking at it is that it shifts load to new guards so we can make better use of them. How much does this additional churn factor impact your answer from step one above? Or asked a different way, what fraction of Alice's vulnerability to the adversary's entry guard comes from natural churn, and what fraction from the proactive expiration? How does the answer change for different expiry intervals (e.g. between 10 and 30 days, or between 60 and 90)?

Research question three: how do these answers change as we vary G? Right now we choose from among the first three guards, to reduce the variance of expected user performance -- if we always picked the first guard on the list, and some users picked a low-capacity or highly-congested relay, none of that user's circuits would perform well. That said, if choosing G=1 is a huge win for security, we should work on other ways to reduce variance.

Research question four: how would these answers change if we make the cutoffs for getting the Guard flag more liberal, and/or change how we choose what nodes become guards? After all, Tor's anonymity is based on the diversity of entry and exit points, and while it may be tough to get around exit relay scarcity, my theory is that our artificial entry point scarcity (because our requirements are overly strict) is needlessly hurting the anonymity Tor can offer.

Our current algorithm for guard selection has three requirements:
1) The relay needs to have first appeared longer ago than 12.5% of the relays, or 8 days ago, whichever is shorter.
2) The relay needs to advertise at least the median bandwidth in the network, or 250KB/s, whichever is smaller.
3) The relay needs to have at least the median weighted-fractional-uptime of relays in the network, or 98% WFU, whichever is smaller. (For WFU, the clock starts ticking when we first hear about the relay; we track the percentage of that time the relay has been up, discounting values by 95% every 12 hours.)

Today's guard cutoffs in practice are "was first sighted at least 8 days ago, advertises 100KB/s of bandwidth, and has 98% WFU."

Consider two relays, A and B. Relay A first appeared 30 days ago, disappeared for a week, and has been consistently up since then. It has a WFU (after decay, give or take a fencepost) of 786460 seconds / 824195 = 95.4%, so it's not a guard. Relay B appeared two weeks ago and has been up since then. Its WFU is 658517 / 658517 = 100%, so it gets the Guard flag -- even though it has less uptime, and even less weighted uptime. Based on the answers to the first three research questions, which relay would serve Alice best as a guard?

The big-picture tradeoff to explore is: what algorithm should we use to assign Guard flags such that a) we assign the flag to as many relays as possible, yet b) we minimize the chance that Alice will use the adversary's node as a guard?

Comments

Please note that the comment area below has been archived.

All three requirements for

All three requirements for entry guards are easy to fulfill for an organized adversary. What does prevent an adversary to throw 100 virtual servers in the pool with above median bandwidth and full uptime and wait one week? Is there a criteria not to select all entry guards from the same IP block?

The critical question is how long does Alice stay with advesaries entry guard she had the pity to select? How much time does advesary need to have a sure correlation?

(Brainstorm mode on) What about virtualizing the entry guards? Each circuit contains two entry guards as the first hop in the path. On the way back the middle hop randomly selects one of the two pre-selected entry guards of that circuit to send the packet back.

> All three requirements for

> All three requirements for entry guards are easy to fulfill for an organized adversary. What does prevent an adversary to throw 100 virtual servers in the pool with above median bandwidth and full uptime and wait one week?

Nothing prevents it. We once imagined we could run a Tor network only using relays run by people we knew. That approach doesn't scale, especially when anonymity is a function of network diversity. Instead, we need to grow a larger network: the more relays we have, the less impact a new attacking relay can have.

That said, with high probability, none of these 100 guards will be used by Alice after a week. That point is the main focus of this blog post.

> Is there a criteria not to select all entry guards from the same IP block?

Yes. No more than one guard in the same /16 network.

> The critical question is how long does Alice stay with advesaries entry guard she had the pity to select?

For many weeks. Bad news for Alice in this case.

> How much time does advesary need to have a sure correlation?

Open research question. My guess is "not much time".

> What about virtualizing the entry guards? Each circuit contains two entry guards as the first hop in the path. On the way back the middle hop randomly selects one of the two pre-selected entry guards of that circuit to send the packet back.

Maybe? I worry that two increases the attack surface over one, without much clear gain. See the discussion under https://blog.torproject.org/blog/one-cell-enough as well.

first excuse me because my

first excuse me because my english is not good.
this program is a nice job BUT really its not Secure for people in some countries that government forbid useing this kind of programs , the reason is this : its very easy for the Internet providers to download this program run that program and check the ip addresses that this program will use , if they do this 100 time then may be they have 100 (less or more ) ip addresses of tor servers . and after they will put this ip addresses in the system to check which clients are connecting or connected before to this ip addresses , and after they will arrest the people very easy !!!!
please if you really are thinking about internet freedom and wants to help people to access to free internet and safe internet just release the complete package of tor software (client and server ) for public use , after it will be 100000000 times safer specially for writers journalists and ..... because they will buy a vps (virtual private server ) or ded servers for themselves in the other countries and after they can connect to the spesific ip address that nobody knows any thing about that , and its not important to route over 10-20-or more servers .
people in groups of 5- 10 can buy one vps for just 20 Us$ and pay 2-5 $ per month .

LOOK THIS PROGRAM WITH SPECIFIC SERVERS NEVER CAN BE SECURE BECAUSE ITS VERY EASY TO FIND WHO IS USING THIS PROGRAM .

i write this comment for the other projects before, and what i understand was this " people who was working on that projects just was thinking about money and they dont care about the people like every where people shows they are helping but in fact they are thinking about mony " i think the only reason that stops releasing the server side program is money . i hope directors of this project release the server side program for freedom
if you dont like to release the current version of your server side you can release older version or one lite version ,
the only thing we need is one program(server/client) that can encrypt/decrypt the data with flexibility of using ports ,. Not more

i hope that i see one answer

I am not clear if you are

I am not clear if you are talking about Tor or not. All of Tor's code is free, the exact same software that is a client can be configured to be a server in 2 clicks or a few lines in a configuration file. You may also notice the tor software is free, as in free of cost and free as in freedom. I'm confused as to why you think the server-side isn't the same as the client, and why you think we charge for any part of the software?

However, assuming you are talking about Tor. Yes, if you do not use bridges, a sophisticated government can record every IP address connection between a user and the Internet. If they cross reference this list of IP addresses with Tor relays, they will see a user is possibly using Tor. However, this is one reason we developed bridges, https://torproject.org/docs/bridges.html.en. They are non-public tor relays that allow entry into the global Tor network.

Perhaps you should review how Tor works, https://torproject.org/about/overview.html.en#thesolution. You can already get tor, install it on a VPS, configure it as a non-published bridge, and give out that address to your friends. Lots of people do this already.

I always wondered how the

I always wondered how the entry guard flag system works. Thanks for sharing.

That said, if choosing G=1

That said, if choosing G=1 is a huge win for security, we should work on other ways to reduce variance.

Every client needs to have at least two guard nodes, in two different 'families' and /16 netblocks. If a client has only a single guard node, the set of exit nodes that it uses is likely to be distinguishable from that of other Tor clients.

On the other side of this

On the other side of this equation is the risk that somebody identifying your guards can use them as a fingerprint for you. There are an increasing set of papers that describe how to learn the relays in Alice's path (there's another one from Prateek Mittal coming up at CCS this year). If you can do the attack reliably over time, you can learn that whoever Alice is, she's picked these N relays as her guards. If G is small, this fingerprint is not very meaningful: plenty of other users have the same fingerprint. If G is large, I worry that some fingerprints will be rare.

As for G=1 vs G=2 and the risk that having only one guard could let a website fingerprint you as "that guy who never exits from 18.244.x.x", I think that's just another component of the tradeoff space that we need to evaluate.

I should note that if we use our current guard algorithm and G=1, we will automatically handle the case you note: if our exit node is in the same family/subnet as our first guard, we'll automatically move to the second guard in our list for that circuit. But rather than making you happy, that behavior should make you nervous about a new attack -- it means an observer watching Alice can discover when her chosen exit is in the same family as her primary guard. Isn't anonymity fun. :)

Why trust an entry guard

Why trust an entry guard more than any other relay node though?

If an entry guard is compromised, then how does "UseEntryGuards 1" not make matters even worse for a client than choosing an entry point at random?

It seems like common sense to me that if I wanted to identify clients by monitoring traffic flows from entrance to exit, that that would be best served by me running 'entry guards' and mandating all Tor users to utilize those entry points by default. That's the prime position to be in, is it not?

I don't see how an entry point's uptime helps safeguard against this exploitation at all. Doesn't it just mean they're that much more dedicated to achieving their aim, having put the time and resources in?

Isn't the argument that entry guards provide more stability at the expense of a greater chance of your security being compromised due to the concentration of clients towards these nodes?

Seems to me you're mixing

Seems to me you're mixing two issues together.

The first issue is whether we should do the "guard" design for entry nodes at all. That is, whether Alice should pick a few relays and stick with them as the first hop for all her circuits. I think the answer there is a clear 'yes'. See the faq entry (and the papers it links to) for more details: https://www.torproject.org/docs/faq#EntryGuards

The second issue is what subset of the relays should be acceptable choices as entry guards. I agree with what I think you're saying, in that I suspect the subset should be larger than it is now. But when you say "entry guards provide more stability at the expense of greater chance of compromise", you're missing the subtlety -- the stability actually improves anonymity, because less churn in your entry guards means fewer chances to pick a bad one. So the choice is not between stability and security; it's between defending against one attack vs defending against another attack. I think we are choosing a not-very-good point in this tradeoff currently; we should use historical Tor network data to help us recognize where the better points for the tradeoff are.

Doesn't the current concept

Doesn't the current concept of sticking to a few entry guards for a long time sacrifies one for sure so the rest of the flock is save? The hunter is slowed down but Alice is a sitting duck when she gets in sight?

Instead of slowing down the hunter Tor could speed up the movement within the swarm.

I assume factors for correlation are connection time, transferred bytes and transfer time. There is little to be done about the connection time to the entry guard. Chance of correlation should increase with the number of bytes Alice receives over one circuit.

(Brainstorm mode on)

A: Changing first hops as soon a fixed number of bytes is transferred (which might be equal or smaller the size of an average web page). This number would be the same for all Tor users.

Or B: Individually, referring to the idea of virtual entry guards, by changing to the twin as soon as a random size of the current file is transferred (between 1/3 and 2/3).

Both A and B changes the number of bytes of a transferred file measured at adversary's entry guard. Transfer times for the file parts are also likely to differ by running over different first hops.

(Brainstorm mode off)

Conjecture #1: an attacker

Conjecture #1: an attacker who sees traffic at both sides of the circuit can confirm pretty quickly that he's seeing two sides of the same circuit. There are a number of papers supporting this idea, e.g:
http://freehaven.net/anonbib/#danezis:pet2004
and basically none refuting it.

Conjecture #2: this attacker can confirm it in just a few packets or cells. This claim is shakier, in that there are fewer papers exploring it. But the rate of improvement for attack papers compared to defense papers makes me think it's unwise to doubt it. Or said another way, we're better off designing conservatively under the assumption it's true than designing a broken anonymity system on the hope that it's false.

Protecting against confirmation attacks is an open research topic that's gotten a *lot* of attention from researchers with basically no results. It's tough to produce a padding/morphing/etc design and also a proof that there are no practical attacks against it. Until somebody produces something more convincing than "surely this would work", I think it's best to assume we can't stop traffic confirmation attacks of any form.

Does the risk of correlation

Does the risk of correlation changes wether Alice is uploading or downloading?

If I understand the model right, choosing the best value for G depends on the ratio of how many nodes of the total are under control of an adversary. And this number is unknown and can only be estimated.

My guess is if less than half of the nodes are under control of an adversary then changing between more entry guards is better.
Maybe a mathematician can chime in and calculate the probabilities for different percentages of entry nodes under adversary control.

Another guess, if Alice uses only a part of her time on the Tor network for critical work she is better of with a higher number of entry guards.

because don't implemented

because don't implemented tor a function, a menu that allows you to download videos such as those of youtube, not through a computer tor, but dividing it on multiple computers, and then reassembled on the destination computer. For example you can use a program that can download the movie, only in a certain period of time, once reassembled at the destination, you can play through vlc. In this way the connection
is ripard also gives access to movies, which generally is slow and penalizing,

Why not do the same thing with FTP files, allowing the user decide whether to enable the tor to the function?

Should there be some

Should there be some advanced weighting mechanism for users
to specify their adversary? Yet not allow obvious foot shooting?
Such as country, AS, provider, etc. I know this has come up
before, just not sure where. And it would certainly depend on
some new metadata or lookup method too.

Do those entry guards

Do those entry guards persist as one moves from home to office, to airport, to internet cafe?

Yes. It would be nice to

Yes. It would be nice to build "profiles" or something where Tor recognizes that now you're the office you and it switches to those guards. But how do we do that without keeping a nice set of profiles on disk, which is bad for other reasons? We don't want to just throw out the guards when you change location, since that would substantially defeat the purpose of them in the first place.