Trip report, UCSD

by arma | August 28, 2010

On Sunday (8/22) through Wednesday (8/25), I visited the Tor research group at UCSD as part of my ongoing plans to help academic research groups better understand Tor and its research problems. Damon McCoy has a fellowship (postdoc) there for last year and this coming year, and he's brought Kevin Bauer in from UColorado from now until December. They have two systems profs with congestion control background (Stefan Savage and Geoff Voelker) interested in helping them work on Tor and performance.

Kevin is planning to spend the next year on Tor performance work, as the last chapter of his thesis. He's also applied to Ian Goldberg's postdoc position at Waterloo. He seems like a smart and dedicated guy; I'd be excited if Ian picks him.

I spent most of my time walking Damon and Kevin through Tor's current congestion control levels -- explaining what Tor does, as well as what I think is actually resulting from each of these components. Kevin has lots of notes, and if all goes well that will seed the core of a "Why else is Tor slow" whitepaper over the coming months, as a sequel to the original.

Kevin wants to work on figuring out how to better tune all the knobs Tor has, but he also wants to work on a new design for an ipsec-based Tor protocol (we ruled out DTLS as not being in broad enough use; that leaves ipsec). I want both, so I'd be fine with either one. The UCSD systems group has a simulation engine called ModelNet that is apparently really good at simulating actual networks. Kevin is going to try to get a separate Tor network going with modelnet as its glue, and then he'll be in a better position to do more controlled experiments.

At the same time I think he'd be in better shape thinking about Tor performance if he messes around with a live Tor relay enough to figure out where its bottlenecks are. I gave him a list of 10 or so potential problems, and each of them is probably a big problem in some contexts (e.g. under certain amounts of load) but maybe not in others. The current intuition is that Tor has grown a lot of knobs that may be orthogonal to whether we are using TCP or something else as transport, and analyzing the performance of a naive design that uses something else as transport but doesn't consider the knobs will likely result in a bad comparison. To say it differently, some of the knobs we have now (e.g. circuit priorities) would still be useful even if we change our transport, but others might no longer be needed, or might need to be different. Many fine open research questions.

I also got a chance to explain enough of Tor to Stefan that he has a good understanding of Tor's overall design. He's still digesting what advice he might have for us, but his initial impression is that it's not clear that end-to-end TCP will perform well for our situation, nor is it clear that thousands of parallel TCP sessions between each pairwise Tor relay over UDP over some datagram link encryption (Joel Reardon's design) will perform well either.

While there, I did a CSE symposium talk that went very well:
http://freehaven.net/~arma/slides-ucsd10.pdf
It was scheduled for an hour, but everybody stayed for 90 minutes, and the crowd was really excited. I invited Chris Davis to the talk, plus the lunch afterward, plus some of the brainstorming after that. Hopefully a better-informed Chris will come in handy in some way in the future. :)

I also talked to Mihir Bellare, a well-known crypto prof, for a few hours. He had a postdoc and a grad student who were excited to learn more about our research problems. Maybe they will help Damon and Kevin redesign a packet-based encryption scheme (where you send an IV and checksum in every packet, to tolerate lost packets and out-of-order packets), like what Freedom designed long ago but never shared very well with the rest of the world.

I also had dinner with KC Claffy, one of the people from CAIDA, a separate org affiliated with UCSD that focuses on Internet data measurement and the privacy/security/policy/analysis issues that go along with that. She was really interested to learn more about our WECSR workshop paper. One of the deliverables they've promised their funders soon is a comparison of various GeoIP databases. I told her about Karsten's preliminary work there, and I should probably follow up.

I met with Harsha Madhyastha, a soon-to-be-first-year-prof at UC Riverside, who wrote "iPlane" as his thesis. iPlane is a database / set of scripts that let you (among other things) predict latency between two points on the Internet. I gave him a big pile of research questions around choosing more efficient paths through the Tor network vs the anonymity implications of even-less-uniform path selection; and the AS-level or country-level path selection questions; and Nick Hopper's latency attack paper and the questions we still have around it. Perhaps one of his grad students will pick up one of the topics.

Overall, it was a good use of a couple of days. I'll plan to follow-up in person sometime in the next 6-12 months, either with a trip to UCSD or a trip to Colorado (where Kevin is returning after his brief stint as research staff).

I should get my act together and answer Nick Hopper's invitation to come spend a similar couple of days at UMN, to get his research group more up to speed on what needs doing.

Comments

Please note that the comment area below has been archived.

The Tor network is faster currently than it has been in a long time, yes. I think that's due in large part to load -- we're missing the 100k+ users from China that we had not long ago. In any case, there are still some design flaws in Tor (or rather, some features we haven't understood or put in yet) that we need to better understand for when the load returns.

Your conspiracy theory about timing attacks makes no sense. I would encourage you to write a research paper showing that timing attacks become significantly easier as latency goes down. Until I see one of those, and it convinces me that timing attacks are actually hard at *any* reasonable latency, I think it's safest to continue to assume that end-to-end correlation attacks work great against Tor no matter the latency.

August 29, 2010

Permalink

I expect to join the mail list, but since congestion is an up-topic, let me ask this: Is it better to turn Tor off when you will be using direct web connections for a while?

August 30, 2010

Permalink

In my 08/29/10 comment above, I should have more specifically asked whether it is better (for the Tor network) to stop Tor by way of the Vidalia Control Panel, having already disabled Tor with Torbutton, for a not-very-brief direct connection session.

It shouldn't matter much.

Tor stops fetching most directory information, and stops building circuits preemptively, if you don't use it for an hour. Specifically, it still fetches the network consensus snapshot (to know what the relays are) every 2-4 hours, but it stops fetching the server descriptors. The consensus is 200K give or take, whereas the server descriptors are a couple of megabytes over the course of each day.

We've talked every so often of having Tor stop fetching *any* directory information if you don't use it for a day or something. This strategy would be especially important if Tor ends up in a default package set for some version of Ubuntu. We haven't done it yet though.

August 31, 2010

In reply to arma

Permalink

I have a great idea why not check the bandwidth available and if the connection has more than 3mbps available, is on a lan and with an external ip in a jurisdiction that is not problematic than have it automatically become a non-exit node relay. Maybe cap the default throughput to 20kps but slowly increase it based on the available bandwidth. That'll keep most people from noticing and greatly reduce Tor user bandwidth issues. The purpose here is not to be malicious or deceptive- it is rather to get the participation of non-technical users. Than make it into a feature of the next version of Ubuntu or Firefox :). It could even be a simple box that says "Firefox now features Tor built-in: defending against threats to personal freedoms and privacy. Select yes to donate 10% of your bandwidth to the project. When you need to speak anonymously use the green button in the lower right and your communications can't be traced back to you." the first time you start the browser. The issue then would be of exit nodes. It would also greatly increase the difficulty of determining if a user who was sophisticated or not or using Tor at all with all the traffic and default settings. The Tor button would be off by default in the browser and only the relay would be on to prevent users from accidentally leaking passwords on http:// sites through Tor unknowingly. If a user pressed the button it would warn them that the feature turned the connection anonymous, but made it possible for others to eavesdrop on the communications and any passwords entered into non-encrypted (https://) sites would be visible to malicious parties.

September 04, 2010

Permalink

If you have an IPSec based protocol, does that mean Tor clients will be able to have an IPSec tunnel from the exit node to a server that supports IPSec, like you can have a HTTPS tunnel now?

I don't think it would work now, because IPSec operates at a lower level than Tor does. Is that correct?