Counter Galois Onion: Improved encryption for Tor circuit traffic
It's always a good day when we can talk about cryptography. Especially when we are sunsetting one of the oldest and most important encryption algorithms in Tor and replacing it with a research-backed new design, called Counter Galois Onion.
This overhaul will defend users against a broader class of online attackers (described below), and form the basis for more encryption work in the future.
Which cryptography are we talking about here?
The algorithm in question is Tor's relay encryption. While Tor uses the standard TLS protocol for communication between relays, and between clients and relays, it needs a specialized algorithm for encrypting user data as it traverses multiple relays in a circuit.1
That's the relay encryption algorithm. The client shares a symmetric key with each relay on its circuit, and encrypts an outgoing message, or "relay cell" with each one of those keys. Each relay can remove a single layer of encryption, until the client's cell reaches the exit relay.
Of course, we need to make sure that the data isn't modified on the way from the client. For that, we include a cryptographic digest in the cell. The digest covers not only the cell itself, but also all previous cells sent through the circuit (to prevent re-ordering) and another secret shared key (to make the digest unpredictable).
So with the digest added, clients behave as follows: they calculate and set the digest, then they use a stream cipher (AES-128-CTR in Tor's case) multiple times to encrypt the cell for each relay.
If we simplify a lot, then a relay cell looks a little like this:
| Field | Width |
|---|---|
| Zero | 2 bytes |
| Digest | 4 bytes |
| Other stuff | ... |
The "zero" field is there so that nodes other than the exit can avoid checking the digest: If a relay gets a cell with a value other than zero, then it can tell that it isn't the recipient of that cell, so it doesn't need to check the digest: it just decrypts the cell and forwards it to the next relay.
When we designed this algorithm, we didn't give it a name. Now that we're replacing it, it helps to have some way to identify it, so we're calling it "tor1".
The input is a message M. A counter CTR is expanded via a psueodrandom function (PRF, instantiated with AES-128-CTR), to produce a stream of bytes. These bytes are xored with M to produce a ciphertext C.
The message M is mixed with the hash state HS, and a portion of the digest is appended to the message. The whole thing is then xored with the PRF (AES-128-CTR) to produce our ciphertext C.
Wait, that design looks funny!
Yeah, you wouldn't build it that way nowadays, would you? That's why we're replacing it.
In today's world of high-quality online courses and excellent introductory material, it's easy to forget the general state of accessible cryptography resources when Tor was getting started. AES was brand new, and authenticated encryption as a separate field of study had just started to emerge.
Existing designs weren't suitable for Tor's needs. The original onion routing paper didn't specify a means for authentication. Designs like mixmaster and mixminion were optimized for larger message sizes, and required a separate full digest for every possible layer of encryption. (For example, Mixmaster supported up to 20 remailers, so had to reserve space for 20 digests (and other stuff) in every message.)
Some of the "weird things" in the current tor1 design are outdated, but not awful; others are things that are more valuable to resolve.
So, what are the problems with the tor1 design?
There are a few. First the big one:
Problem 1: Tagging attacks
Tagging attacks enable an active adversary to trace traffic by modifying it in one place on the network, and observing predicatable changes in another. Even when tagging attacks don't succeed immediately, their side effects can give the attacker more and more opportunities to retry.
This is the most important attack we're solving with CGO. Even without the other problems below, this one would be worth fixing on its own.
The main version of this attack arises because tor1's use of AES-CTR encryption with no hop-by-hop authentication means that the relay encryption is malleable. Since counter mode derives its ciphertext C by XORing a secret key stream S with the plaintext P (C = S ⊕ P), an attacker who can XOR their own pattern M in to the ciphertext will produce C' = (S ⊕ P) ⊕ M = S ⊕ (P ⊕ M) — that is, a valid encryption of (P ⊕ M).
An attacker can use this attack to ensure that they control both ends of the circuit. They XOR a pattern onto a cell at one end, and then see if any garbled cells at the other end become clear when whey remove that same pattern. Any circuits with an honest endpoint will fail (and not be deanonymized), but the client will retry them until they eventually choose a malicious endpoint.
If the attacker chooses a known-plaintext portion of the relay cell for their marker (such as the header or slack zero space), then they can use their marker to communicate an identifier across the circuit, by retrieving it at the end:
M = (P ⊕ M) ⊕ P.
M can then be used to transmit an IP address or unique identifier for the user.
In comparison to probabilistic traffic correlation, this attack provides definite results immediately, with a strength multiplier: it also allows the attacker to ensure that all the traffic they successfully carry is fully deanonymized, before the circuit is used for any application traffic at all.
The downside for the attacker is that the resulting failure rate of circuits can be detected by the client. Currently, Tor clients emit log notices and warnings when circuit failure rates are excessively high. Unfortunately, as vigilant users have noticed, when the DDoS attacks on Tor become severe, these detectors give false alarms.
This class of attacks (where an adversary is able to abuse the Tor Protocol to transmit information between relays before application activity) is known as Internal Covert Channel attacks. Tor is in the process of updating its threat model to cover these attack vectors explicitly, along with two other categories of attack vectors.
Problem 2: Forward secrecy begins when a circuit closes
This attack and the one after it are much less severe than the tagging attack above; we mention them for the sake of completeness.
In many modern online protocols, including messaging apps like Signal, the keys used to decrypt a message are destroyed as soon as the message is decrypted, so that nobody can steal them and use them later on. But Tor's old encryption algorithm (tor1) doesn't provide this property: the same AES keys are used for the entire life of the circuit. That means that if a key was stolen while the circuit was still alive, all previous traffic on the circuit could be decrypted.
When a circuit's lifetime is on the order of minutes, that's not so bad, but sometimes circuits stay around for days. (What's more, longer-lived circuits may be better for anonymity, especially when the user is maintaining a persistent identity, so it's a good idea to make them stronger.)
Although this attack is minor in comparison to the tagging issue, we may as well address it while we are updating our encryption.
Problem 3: A 4-byte authenticator? Seriously?
Yeah, that's not great. The use of a mere 4-byte digest means that there's a one-in-4-billion chance to forge a cell undetected.
That isn't a very good attack in practice: if the attacker doesn't get lucky with their guess, then their invalid message causes the circuit to fail, and they can't try again unless the client builds another circuit through them. The same pathbias mechanisms that help resist tagging attacks also help here, but it would be better not to need them.
(Also, it's using SHA-1, which is showing its age, to say the least.2)
So, how did we think about replacing this in the past?
We've wanted to replace this algorithm a few times, but we've been hung up on issues of design and efficiency.
We definitely don't want to follow remailer designs by adding one authenticator per layer of encryption: that way lies big overhead. If we tried something like that with onion services, we'd be devoting something like 15% of our bandwidth to authenticator fields.
One promising design element has been wide-block ciphers: these are ciphers (or modes of using ciphers) that encrypt an entire message as if it were a single opaque block: any change in the ciphertext garbles the entire message as if it were a single block in a regular block cipher.
(Technically, this needs to be a "strong pseudorandom permutation" (SPRP) if it's going to resist tagging attacks.)
You can make a wide-block cipher into an authenticated cipher by reserving some portion of the plaintext for a known value -- say, 16 bytes of zeros.
But most wide-block cipher designs are comparatively expensive. Nearly all of the strong ones (BEARESS, LIONESS, biIGE, HHFHFH) require two full encryptions and two hashes over the data. Newer modes like HCTR2 require only one encryption pass, but still need two hashes. (For comparison: tor1 requires 3 encryptions and one hash for a cell on a 3-hop circuit, whereas one of these designs requires on the order of 6 encryptions and 6 hashes.)
We're willing to pay some CPU cost for improved cryptography (and we should expect to pay some cost, since authentication doesn't come for free) but we need to keep the cost to a minimum.
Now, multiple passes are necessary for any wide-block design: it's provable that there's no way to make sure that changing any bit will potentially garble every other bit unless there are at least two passes. But we'd like to make these passes as cheap as possible!
There has also been excellent work on other wide-block designs built from scratch, rather than from an underlying cipher (notably AEZ).
What are we going with?
For years now, cryptographers have been looking for good solutions here.
Jean Paul Degabriele, Alessandro Melloni, Jean-Pierre Münch, and Martijn Stam have a design that they're calling Counter Galois Onion (CGO). It's based on a kind of construction called a Rugged Pseudorandom Permutation (RPRP): essentially, it's a design for a wide-block cipher that resists malleability in one direction (for the encrypt operation, but not the decrypt operation). If we deploy this so that clients always decrypt and relays always encrypt, then we have a tagging resistant3 cipher at less cost than a full SPRP!
Using a RPRP that they call UIV+ (see the paper), the authors achieve all of our goals (tagging resistance, immediate forward secrecy, longer authentication tags, limited bandwidth overhead, relatively efficient operation, and modernized cryptography).
(Shortly before this blog post went up, they revised their paper and released a new security proof.)
We've written a specification which matches their paper and their reference implementation.
How does it work?
CGO makes it so that if anybody tampers with any part of your encrypted data, the entire message, and all future messages, become unrecoverable. Here's how!
(If you don't like reading cipher diagrams, you may want to skip this section. And if you really like reading them, you should check out the specification and the paper!)
The figures below present the UIV+ building block, and show how it is used to build CGO encryption.
The input X is split into two parts: A short X_L, and a longer X_R. X_R, and a "tweak" value H, are themselves passed as tweaks to a tweakable block cipher E_T (instantiated with LRW2), which is then used used to encrypt X_L. The output of this encryption seeds a PRF, which is xored into X_R to encrypt it.
CGO treats every message as a 16-byte tag T, and a 493-byte ciphertext C. These are passed as X_L and X_R to the UIV+ encryption algorithm above. The tweak value (H in UIV+) is here called T': each cell's "T" value, after encryption, is taken as the T' for the next cell.
When _originating_ the message, CGO initializes its tag as a nonce value N. The value of N, _and the encryption keys_, are all transformed using an "Update" algorithm as the message is encrypted. The new N, and the new encryption keys, will be used to encrypt the next cell.
Okay, but how does all of this cryptography solve the problems we began with?
First (and most importantly) tagging attacks are prevented by two factors:
- When encrypting4, the message is transformed in a wide-block construction, so that any change to the input renders the entire output unrecoverable.
- The chaining of T' and N values means that a message's encryption depends on all previous messages, so if a single message is garbled, all subsequent messages will be unrecoverable.
Second, forward secrecy is achieved with the Update construction in figure 5. Every time a new cell is originated or received, the keys used to originate or receive it are transformed unrecoverably, so that the encryptor/decryptor no longer holds the keys necessary to decrypt earlier cells.
Third, the truncated digest is now replaced with a nice long 16-byte authenticator, like sensible people use.
Aside: What if we're wrong?
CGO is a fairly new design, and it's reasonable to ask whether there could be weaknesses in it that would make it worse than it's designed to be. I'd answer: There might be! Attacks only get better with time, and although the cryptographers behind CGO are skilled and well regarded, even the best cryptographers can make mistakes. There is a security proof, but it's fairly recent, and and it hasn't yet gotten intensive scrutiny. With time, as CGO gets attention from more cryptographers, we'll (hopefully) gain more confidence in its strength. (And if we do decide that we need to replace it, the work we've done to add it to Arti and the C Tor implementation will make it much easier to do a later migration to a different system later on.)
But what we are pretty sure about is that there aren't likely to be any weaknesses in CGO that would make it worse than tor1.
Where does our implementation stand?
It's underway!
We've implemented the cryptography for Arti, the Rust Tor implementation. We've also implemented it in C, since it won't do us any good unless relays support it too, and the Arti relay project is still a work in progress.
In order to build this implementation, we've had to refactor a lot of code to revise its existing assumptions: for example, we've had to revise all the places where we assumed anything about the layout of a relay cell, or where we assumed that there was only one way to do relay encryption. These changes will help us with any other changes to relay cell formatting and encryption in the future.
Our next steps are:
Enable CGO by default in Arti. (It's currently marked as experimental because of some of its dependencies.)
Implement CGO negotiation for onion services. (This feature is likely to be be Arti-only, due to its complexity.)
Tune the performance for modern CPUs. (The CGO authors got impressively good results for their optimized implementation, but some of the tricks they used will be pretty hard to deliver in the C Tor implementation without big refactoring. Fortunately, there are some low-hanging fruit in optimizing what we have today.)
Thanks for your help!
Thanks to all the cryptographers, programmers, researchers, and cypherpunks who have contributed to this work over the years. Thanks also to all the researchers who have advanced this work, and the state of modern cryptography; we all stand on the work they have built.
And thanks to Mike Perry for helping me write this and get the parts of the threat model right.
And finally, thanks to everybody who has donated to Tor over the years!
A lot of this work is done through a grant from the
Bureau of Democracy, Human Rights, and Labor,
but many critical parts of this work
(including years of past groundwork,
and all the parts related to onion services)
have been paid for out of our unrestricted funds,
which rely on donors like you.
Thanks for believing in our mission,
and thanks for helping to make Tor better! <3
Why not just use multiple layers of nested TLS? First, because each layer of TLS adds bandwidth overhead, and that overhead adds up fast. Second, because TLS implementations aren't aiming for anonymity, they tend to do things like generate records of different sizes, making it hard to cause traffic to appear uniform. Third, because the TLS protocol exists in many different flavors that client implementations are often distinguishable from one another: we would need to take special care to keep every client on the same TLS implementation, with locked-down flags and options. That would make upgrading and portability highly difficult.↩
Yeah, the tor1 algorithm uses SHA-1. It was 2002, and OpenSSL didn't have SHA-256 support yet. This isn't as bad as it looks, though: we don't actually need collision resistence here, and the attacker doesn't actually get to see the digest under relevant circumstances, so the regular weaknesses of SHA-1 don't apply in full force.
Nonetheless: we will be quite glad to say goodbye to SHA-1 once tor1 is finally replaced.↩
Technically, the client can tag their own traffic, but they wouldn't gain anything by doing so.↩
Remember the RPRP property: encryption resists malleability, but decryption is malleable.↩
Comments
We encourage respectful, on-topic comments. Comments that violate our Code of Conduct will be deleted. Off-topic comments may be deleted at the discretion of the moderators. Please do not comment as a way to receive support or to report bugs on a post unrelated to a release. If you are looking for support, please see our FAQ, user support forum or ways to get in touch with us.