The MD5 certificate collision attack, and what it means for Tor

by nickm | December 30, 2008

Today, a team of security researchers and cryptographers gave a talk at the 25th Chaos Communication Congress (25C3), about a nifty attack against X.509 certificates generated using the MD5 digest algorithm. We figured that people will ask us about how this attack affects Tor, so I'm writing an answer in advance.

The short version: This attack doesn't affect Tor.

The medium version: This attack doesn't affect Tor, since Tor doesn't ever use MD5 certificates, and since Tor doesn't care what certificate authorities say. On the other hand, this attack probably does affect your browser. Check your browser vendor for updates over the next few days and weeks, and make sure you install them.

The long version: To understand the attack, first you've got to understand certificates. When your browser makes a connection to a "secure" website, it uses a protocol called SSL (or sometimes TLS) to see who it's talking to and encrypt the connection with them. In SSL, parties are identified using X.509 certificates, which are issued to them by certificate authorities, or "CA"s. Your browser comes with a big list of certificate authorities. When your browser sees a certificate that was signed by a certificate authority it recognizes, it knows it's talking to the right website.

Certificates, like nearly anything of interest, are too big to sign as-is, so the CA uses a cryptographic "digest" algorithm to derive a short "hash" of the certificate that it can sign. The digest algorithm is supposed to be "collision resistant", so that nobody can find two different certificates that produce the same hash. Such a collision would be bad, since somebody who could produce two such certificates could get a CA to sign one of them, and then use that signature on the other one. Since the hash values would be the same, nobody would be able to tell that the CA had not really signed the second certificate.

With me so far? Good. Let's talk about MD5. There is an old broken hash algorithm called MD5. How old and broken? Cryptographers have considered it weak since 1996 or so, and there have been known real collisions in it since 2004. In 2007, researchers published a method for generating MD5 collisions that could be used against X.509 certificates. In other words, the writing has been on the wall since at least 1996 (arguably 1993), and the writing has been getting bigger year after year. You'd have to be pretty oblivious to still use MD5 for signing certificates in 2008!

Unfortunately, some brave CAs still use MD5 for signing certificates in 2008. And so we come to the attack.

Using a method derived from the 2007 paper, and a cluster of 200 Playstation 3s, the researchers generated two certificates that would produce an MD5 collision: one innocuous one, and one CA certificate. They got a CA to sign the first one, and then transferred this signature to the second. Since the second certificate was for a CA, they now had a certificate that let them generate their own certificates, and make any phony claims they wanted about the identity of any website. If your browser saw one of these phony certificates, it would believe it, since it was ultimately signed by a CA it recognized.

For more information on the attack, with a lot of complicated tricky bits I didn't mention, see the authors' writeup.

The good news is that Tor itself is not affected. Tor doesn't use MD5 for anything[*]. Tor doesn't use commercial CAs. Tor doesn't sign certificates for others, and everything in Tor that is signed is signed using SHA-1, not MD5.[**]

The bad news is that your browser probably does have some of the affected CAs listed. As how-to guides for securing your browser become available, we'll post links to them here.

Finally: Happy new year! And best wishes to any programmers who get stuck working all night on New Year's Eve to remove MD5 from their system.

Footnotes:

[*] The fine print: Tor uses the TLS protocol, which uses MD5 in a couple of places. But TLS uses in tandem with SHA-1, so an attacker will need to break SHA-1 and MD5 at the same time to harm TLS's security. Read RFC2246 for the ugly ugly details.

[**] Yes, I know that SHA-1 is showing its age too. Unfortunately, the SHA-2 algorithms aren't that much better, and nothing else has seen the same amount of analysis. Once the NIST hash function competition has picked a SHA-3 candidate, we'll switch to that. In the mean time, I'll be launching some design work on or-dev to make it easier for Tor to switch to a better hash algorithm, once we've got one, or in case we need to jump off SHA-1 in a hurry.

network

Comments

Please note that the comment area below has been archived.

SHA-3 will take a while

SHA-3 is scheduled for 2012. Until then there will even better Attacks on SHA1/2. Oh and PS3s will be cheaper ;-)

So what about trying a switch to an other algorithm?
Tiger and Whirlpool look good.
Or Skein, if you want to use a possible SHA-3 Function.

Devils we know, and devils we don't

I've heard good things about Whirlpool, but I'm no cryptographer. If I understand correctly, the SHA-1 family (and the SHA-2) have gotten more analysis than any other not-totally-broken-in-practice digest functions. By the time the SHA-3 competition is done, the SHA-3 candidates will also be heavily analyzed by the best cryptographers in the field. I don't think that Tiger or Whirlpool has seen quite enough analysis to make me comfortable.

Still, you're right that it would be really bad if Tor is still using SHA-1 when a practical chosen-prefix attack against it is found. I'm hoping we can get the tools ready to migrate to SHA-256 in the meantime, since (a) the SHA-2 functions seem likely to last a while longer than SHA-1, and (b) doing one migration will make the Tor software more hash-agnostic, so that we can move to SHA-3 quickly once it's chosen. Alternatively, if SHA-256 is broken before SHA-3 is out (unlikely, it seems), we could then think about switching to whatever SHA-3 candidate(s) seem best.

Unfortunately, this isn't trivial. We need to maintain backward compatibility, since people would get mad if we made every Tor user and server upgrade their software all at once.

If you want to help, we could use some design proposals here. I've checked in a document to Tor svn at /tor/trunk/doc/spec/proposals/ideas/xxx-what-uses-sha1.txt . It lists everywhere that Tor uses SHA-1 today. If anybody wants to help think about how to design the migration safely, that will help lots whenever we wind up switching to SHA-256, Skein, MD6, or whatever.

check your certificate

By the way, your certificate on https://blog.torproject.org uses md5, guess it's time to get a new one...

yes

it's coming as soon as people return from their travels.

when they will be coming

when they will be coming back? waiting for tor browser bundle updates.

Interesting. I knew site

Interesting. I knew site like http://www.netmd5crack.com could use huge dictionaries to find hashes but this scares me! Is it just certs or is md5 in general at risk?

one typical method of

one typical method of installing/utilizing TWO versions of any software is to have the "STABLE" version (last great one) - and "LATEST VERSION" - typically indicated with a version 1.00, 1.01 (patch) 1.02 (patch) - and all 1 series work together - no major change - then when a new version - major change - i.e. change of certificates etc. - then a 2.00, 2.01, 2.02 etc.

To integrate this with the TOR network, the program would need a dual set of libraries under the main program, such that a "LEADING PACKET" would contain an extra "BIT" or "INDICATOR" that says "S" (stable) or "N" (new) - and then the main program would know which library function to call for a "certificate" or other major changes in operations.

By doing this, the STABLE/LAST STRONG VERSION becomes the "default" whenever anyone has failed to load the MAIN PROGRAM that has this indicator/switch function installed - and even then, it always "defaults" to the older stable version unless a successful handshake takes place (or some other indicator of success that doesn't slow traffic with bi-directional communication) - at which point LIBRARY-N (new) is used.

It would take "a bit" of work to develop MAIN PROGRAM and LIBRARY-N - but generally not as much as you might think as long as you keep the INDICATOR SUBROUTINE in the MAIN program tight and try not to scatter variables everywhere in MAIN. Once you have MAIN talking to STABLE, then test MAIN with NEW and then put 5 - 10 - 50 PCs together and let them chatter away and see if it crashes - then (wise idea) add a module that REVERTS TO OLD VERSION (i.e. backs up from 2.0 to 1.99) in the event of a "panic wipe out" that can take place if everyone switches and a new fatal bug rears its ugly head.

Otherwise, if bugs in the new MAIN / STABLE / NEW libraries are minimal, they become "patches" (2.01, 2.02, 2.03) etc. until you have developed a MAIN that works with both versions - a STABLE/OLD library and a NEW library. Then leave the stable one alone - stop changing it - and work on polishing NEW - and if it ever becomes a problem, back down one level (i.e. from 2.04 back to 2.03) and continue polishing 2.04 or 2.05 until MAIN likes the way it tastes.

Once you've done that a few times, MAIN become tighter and very robust, you get a feel for testing NEW library, and STABLE is really tight - even if it has some "older certificate" feature or other "problem" that could be improved.

Then the big challenge takes place (hold yer breath) when 3.0 is released. At that time, 2.XX (last good version) becomes STABLE (old) library, and 3.01 becomes NEW. This gives everyone months (or a year is really sweet) to gradually upgrade Firefox buttons etc.

It is critical to retain backward compatibility through each version - i.e. (ahem - don't puke) - does it work on Windows 95 or some other legacy system that a prior STABLE version worked well on. I run Windows 2000 SP4 specifically because I'm an old main frame guy who needs reliability and nothing MS has put out since then is as robust (after adding things like a firewall etc.). The day I have to switch off this platform I'll "back down" to the rudimentary Linux/Unix world rather than "move ahead" with the MS spyware infected OS that (ahem) is generally known to be "spying" on behalf of MS: watch your firewall traffic some time and you'll see MS is the biggest spyware system on the net (don't faint either) creating all kinds of outbound traffic you never authorized (which my firewall slaps down in a blink).

HOPEFULLY when TOR gets to around Version 5.XX it'll have a firewall built in as well, though that utility might best be kept separate, and eventually (God willing), a TOR browser. When THAT happens, life will be grand indeed and the only thing missing would be a TOR email server, then a TOR CO-LO Hosting Service (rack mounts and a premium service for all that well protected traffic) and then when we CLONE the CO-LO in various nations, with each site having its own TOR RELAY, this "TOR IS SLOW" problem will be a thing of the past.

THAT is the vision of "TOR" that I see (or pray for) - and if/when I've got $100K - $500K, I'd be delighted to erect the first TOR - BASED CO-LO and Virtual Server service here in the Bay area where I live. I think the world deserves it and I am rather sure it'd be a grand slam money maker compared to other CO-LO sites.

...my two cents (er three)

and p.s. - I have a credit card web site I developed that relies on users to "come with TOR SHIELD DOWN (we record their IP) then TURN TOR ON (we record their IP) before they log in. In this manner we are assured that they have TOR working BEFORE they do any business - and by tracking their IP vs their physical known location - we can watch as their "TOR location" changes and adjust accordingly - and know in a second if they "DROP" their TOR shield - at which point we drop their session.

I believe more sites should utilize this type of "verification" and in the future - when the TOR NETWORK is a bit larger and more robust, I am guessing they will.