Investigating http proxy performance with Tor

A while ago there was a thread on OR-TALK that devolved into

"why does Tor still ship ancient privoxy?"

and

"why are you shipping polipo with the Tor Browser Bundle instead of current privoxy?"

For those interested, the thread is here, http://archives.seul.org/or/talk/Jul-2009/msg00063.html.

Scott had a good argument for why we should update the bundles to the latest privoxy, and I agree, we should. But then I started thinking about why we needed a proxy at all. Almost all browsers support socks5 direct, isn't that faster than a middleman proxy?

This got me thinking about why polipo is in the TBB, but not the other packages. The TBB "feels faster" when using Tor than using the installed Tor, Vidalia, and Privoxy. However, I couldn't find any actual testing of performance of polipo vs. privoxy vs. socks5 direct.

So I did it myself, in a loose manner. I wanted to quantify "feels faster".

The raw data from all the testing is :

  • Tamper Data as xml,
  • proxy config files,
  • and results in a spreadsheet.

This is all contained in http://freehaven.net/~phobos/polipo-v-privoxy.tar.gz {.asc). There is a README as well. And yes, the ruby script is a quick and dirty hack.

I tested a few scenarios:

1) native polipo and privoxy without using Tor.
2) polipo and privoxy forwarding to Tor localhost:9050.
3) firefox socks5 direct to Tor via localhost:9050.

The summary of results:

  • Native polipo is 54.5% faster on average than native privoxy. This could be due to polipo's caching, http 1.1 pipelining, and it can serve bits as fast as they come in from the network. Privoxy needs to load the whole page, scan it, and then send it to the client. Even if privoxy filtering is disabled, it still works the same way.
  • Polipo caching shines with Tor usage. Performance is still about 40% faster with polipo than privoxy. Common images are cached, and served from the memory cache in single-digit millisecond ranges. Privoxy needs to wait for Tor to wholly deliver the bits. Caching is faster, this we know already. However, from a user perspective, it's just faster to load pages.
  • socks5 in Firefox 3.5.2 did better than I expected. It was about twice as slow as polipo, but still twice as fast as privoxy. I chalk this up to the tor circuit variability more than anything else.
  • I tried testing a click to a second page to see how much polipo caching helps people reading different pages on the same site. It helps, but not as much as I expected. Polipo ranged from slower than privoxy to 40-100% faster. Too much variability to make a real determination.

Caveats: Testing under tor is highly variable. I used the same circuits for both the polipo and privoxy tests to minimize variability. However, I can't control node load and congestion.

Out of 23 get requests for the Torproject.org/index.html.en, 17 are for the country flags. Perhaps we should load these last at the bottom of the page, or do something else to speed up the torproject page load.

As I was doing this, I kept thinking of other ways to do it better;

  • time requests and bits between tor, the http proxy, and the browser. How long does each request take to get from the browser, to the proxy, to tor and back across each layer? how much latency does each piece of software add to the request and delivery?
  • automate testing and let it run on a normal tor client over weeks. This will average out tor network variability and show "typical" user experience.
  • Pick a sampling of the top 100 websites by visits worldwide and measure their performance with the three methods, fully instrumented as in #1.
  • Do user experience measurements. Pay/ask/bribe people to sit in front of a computer, video record their browsing and feedback, and ask for a rating of each configuration (socks5, polipo, privoxy, and a placebo).
  • re-run #2 and run gcov to watch the code paths used in each piece of software, and figure out what can be optimized for performance.
  • test various "private browsing modes" through tor to see which browser is faster; firefox, safari, chromium, torfox, or torora.
  • how can we better tune polipo caching dynamically based on system ram config? Does having 1GB of cache provide significant benefits over the default?

I'm sure there are lots of things wrong with my measurements, minimal analysis, and results.

Constructive criticism is welcome.

Anonymous

August 20, 2009

Permalink

"Out of 23 get requests for the Torproject.org/index.html.en, 17 are for the country flags. Perhaps we should load these last at the bottom of the page, or do something else to speed up the torproject page load."

That would be very easy to optimise. Use a single image with all the flags in it, then use an image map to set the links on it.
Alternatively CSS sprites could be used (single file and links/flags displayed using an offset).
The image map is a nicer solution in my opinion.

Anonymous

August 20, 2009

Permalink

Firefox is buggy with socks proxy directly.
It has has a hard-coded 10 second timeout, and it blocks when waiting, so if you have one tab waiting to connect to socks, you cant use the other tabs.

Polipo sucks please stop using it, watch what its doing with a packet sniffer.
Completely non-standard stuff there that breaks alot of sites, converting GET requests into HEAD etc which show blank pages!
Its annoying to be on a forum and click refresh only to get a blank page over and over, because polipo is sending HEAD requests to the server instead of GET which of course results in blank page as there is no response body to a HEAD request.
WTH is polipo thinking modifying a request, its supposed to be pass-through only.

Have you reported this GET to HEAD request transformation to the polipo list as a bug?

If you know of an open source, freely licensed caching proxy that works on windows, linuxes, unixes, and macs, then point it out. I'll look into it.

And yes, Firefox is buggy with SOCKS. In fact, there's a 3-year old bug about the hard-coded timer waiting for someone at Mozilla to address it. At one point, we even offered to pay someone at Mozilla to fix it. The bug being, https://bugzilla.mozilla.org/show_bug.cgi?id=280661

Anonymous

August 22, 2009

In reply to by phobos

Permalink

Nope I didn't report it.
I was under the impression that Polipo was not in active development.
Is that not true?

Is there any point in caching?
I can see the advantage if multiple PC's share the same cache, but browsers already have their own caching anyway.

I prefer to make connections via SSL anyway, and proxy's cant cache those.

Anonymous

August 21, 2009

Permalink

I thought the main reason to use Privoxy was not to leak DNS by using Socks 4a. Is Socks 5 no longer leaking DNS?

Yes socks 5 still leaks DNS.
Socks 4a is the only socks that supports connecting to a host name, socks 4/5 only support IP, so you have to resolve the DNS to IP which leaks.

Most apps don't support socks 4a, but HTTP proxy is very common.

False. Socks 5 works fine.
Easy way to test: Have the vidalia Tor Network page open, and then load a page in Firefox. The connection will appear in the Tor Network page, and you can see it named the same way it's passed to Tor. If the browser passes an IP, you'll see it there, but if it passes the hostname, you'll also see it there. I always see hostnames there (except for when I'm purposely connecting to a specific IP).

BTW The CAPTCHA here sucks, it's taken me a lot tries. You should use something like reCAPTCHA, works nicely for me on my site.

firefox and socks5 with torbutton enabled works correctly. Not all socks5 compatible applications will do name resolution via the socks proxy.

Using recaptcha means trusting a 3rd party to serve up the captcha, and not log requests for the captcha.

If Polipo works better than the other two, why change that? Just work with it, optimize it if you can, leave it at that. Unless there is some really good reason to switch, it's probably best to leave it alone.

I use the Tor Browser Bundle, and it's always been pretty good for me. I've never had serious issues or lag with it.

What will be the effect of removing a filtering proxy on the privacy of the user with respect to tracking cookies and locally stored data by flash objects?

If the user is using torbutton defaults, then the cookies and such are wiped on toggle or restart of the browser. Torbutton disables plugins, like flash, so one shouldn't have flash cookies from browsing with Tor.

i have tested firefox.3.5.2 with socks5 and work perfectly if we use only the browser but if we try to use java or a few application, sometime the DNS can be leaked.

with privoxy, so far only flash leak DNS and bypass proxy, with last java and config it to not save data on system and use privoxy, it don't seem leaks DNS anymore. Flash stay very dangerous and bypass anyway the proxy.

well my conculsion are:

For me are possible yet to use java if the new nocript are installed with cs lite( cookie prevention) , so if someone need java for a specific site , it can be used safty.

Polipo is good but not stable, It has annoying issues when uploading big files.
and its development is inactive. but indeed its a promising candidate. Some people used to chain up squid and privoxy to speed up browsing. If there is a mini squid which is easy to setup i would dump polipo.

but yeah, polipo has another advantage. If polipo is integrated into Tor Bundle(I never use this pack so I dont know). You can create a wizard to help set up a hidden service using polipo as a mini http server

Off topic:
Will there be torIM plugin for Pidgin? I guess chatting through hidden service would be safer than chatting though 3rd party servers.

Privoxy was used to shild against possible attacks which could compromise anonymity.

It seems that users aren't protected any more.

There is no hint that polipo can do the same; There has even been some discussion, that current Firefox 3.5 isn't as "solid" against such threats as FF 3.0 was (which will be discontinued). I think that investigating "performance" is not enough. Tor was not meant to enhance "performance" but anonymity.

Privoxy, or any other (even tor build-in) shielding solution shold be part of the software package.

There is always other reverse proxies like
http://www.delegate.org/delegate/ (windows/linux)

Since Polipo was substituted for Privoxy, tor has become much less useful.

BUG: timing out martian query

Configured to measure directory request statistics, but no GeoIP database found!
Configured to measure entry node statistics, but no GeoIP database found!
Bridge status changed. Forgetting GeoIP stats.
%s\geoip
Unable to parse state in "%s". Moving it aside to "%s". This could be a bug in Tor; please tell the developers.
Uh oh. We couldn't even validate our own default state. This is a bug in Tor.
Initialized state
Unparseable bandwidth history state: %s
Unable to parse state in "%s"; too many saved bad state files to move aside. Discarding the old state file.
Bridges cannot be configured to measure additional GeoIP statistics as entry guards.
Unable to read state file "%s"
Error in accounting options
Error loading rendezvous service keys
Error creating cookie authentication file.
Error parsing already-validated policy options.
Previously validated hidden services line could not be added!
Weirdly, I couldn't even move the state aside. The OS gave an error of %s
Error initializing keys; exiting