New low cost traffic analysis attacks and mitigations
Recently, Tobias Pulls and Rasmus Dahlberg published a paper entitled Website Fingerprinting with Website Oracles.
"Website fingerprinting" is a category of attack where an adversary observes a user's encrypted data traffic, and uses traffic timing and quantity to guess what website that user is visiting. In this attack, the adversary has a database of web pages, and regularly downloads all of them in order to record their traffic timing and quantity characteristics, for comparison against encrypted traffic, to find potential target matches.
Practical website traffic fingerprinting attacks against the live Tor network have been limited by the sheer quantity and variety of all kinds (and combinations) of traffic that the Tor network carries. The paper reviews some of these practical difficulties in sections 2.4 and 7.3.
However, if specific types of traffic can be isolated, such as through onion service circuit setup fingerprinting, the attack seems more practical. This is why we recently deployed cover traffic to obscure client side onion service circuit setup.
To address the problem of practicality against the entire Internet, this paper uses various kinds of public Internet infrastructure as side channels to narrow the set of websites and website visit times that an adversary has to consider. This allows the attacker to add confidence to their classifier's guesses, and rule out false positives, for low cost. The paper calls these side channels "Website Oracles".
All of these oracles matter to varying degrees for non-Tor Internet users too, particularly in instances of centralized plaintext services. Because both DNS and OCSP are in plaintext, and because it is common practice for DNS to be centralized to public resolvers, and because OCSP queries are already centralized to the browser CAs, DNS and OCSP are good collection points to get website visit activity for large numbers of Internet users, not just Tor users.
Real Time Bidding ad networks are also a vector that Mozilla and EFF should be concerned about for non-Tor users, as they leak even more information about non-Tor users to ad network customers. Advertisers need not even pay anything or serve any ads to get information about all users who visit all sites that use the RTB ad network. On these bidding networks, visitor information is freely handed out to help ad buyers decide which users/visits they want to serve ads to. Nothing prevents advertisers from retaining this information for their own purposes, which also enables them to mount attacks, such as the one Tobias and Rasmus studied.
In terms of mitigating the use of these vectors in attacks against Tor, here's our recommendations for various groups in our community:
- Users: Do multiple things at once with your Tor client
- Exit relay Operators: Run a local resolver; stay up to date with Tor releases
- Mozilla/EFF/AdBlocker makers: Investigate Real Time Bidding ad networks
- Website Operators: Use v3 Onion Services
- Researchers: Study Cover Traffic Defenses
Because Tor uses encrypted TLS connections to carry multiple circuits, an adversary that externally observes Tor client traffic to a Tor Guard node will have a significantly harder time performing classification if that Tor client is doing multiple things at the same time. This was studied in section 6.3 of this paper by Tao Wang and Ian Goldberg. A similar argument can be made for mixing your client traffic with your own Tor Relay or Tor Bridge that you run, but that is very tricky to do correctly for it to actually help.
Exit relay operators should follow our recommendations for DNS. Specificially: avoid public DNS resolvers like 184.108.40.206 and 220.127.116.11 as they can be easily monitored and have unknown/unverifiable log retention policies. This also means don't use public centralized DNS-Over-HTTPS resolvers, either (sadly). Additionally, we will be working on improvements to the DNS cache in Tor via ticket 32678. When those improvements are implemented, DNS caching on your local resolver should be disabled, in favor of Tor's DNS cache.
The ability of customers of Real Time Bidding ad networks to get so much information about website visit activity of regular users without even paying to run ads should be a concern of all Internet users, not just Tor users. Some Real Time Bidding networks perform some data minimization and blinding, but it is not clear which ones do this, and to what degree. Any that perform insufficient data minimization should be shamed and added to bad actor block lists. For us, anything that informs all bidders that a visit is from Tor *before* they win the bid (e.g., by giving out distinct browser fingerprints that can be tied to Tor Browser or IP addresses that can be associated with exit relays) is leaking too much information.
The Tor Project would participate in an adblocker campaign that specifically targets bad actors such as cryptominers, fingerprinters, and Real Time Bidding ad networks that perform little or no data minimization to bidders. We will not deploy general purpose ad blocking, though. Even for obvious ad networks that set visible cookies, coverage is 80% at best and often much lower. We need to specifically target widely-used Real Time Bidding ad networks for this to be effective.
If you run a sensitive website, hosting it as a v3 onion service is your best option. v2 onion services have their own Website Oracle that was mitigated by the v3 design. If you must also maintain a clear web presence, staple OCSP, avoid Real Time Bidding ad networks, and avoid using large-scale CDNs with log retention policies that you do not directly control. For all services and third party content elements on your site, you should ensure there is no IP address retention, and no high-resolution timing information retention (log timestamps should be truncated at the minute, hour, or day; which level depends on your visitor frequency).
We welcome and encourage research into cover traffic defenses for the general problem of Website Traffic Fingerprinting. We encourage researchers to review the circuit padding framework documentation and use it to develop novel defenses that can be easily deployed in Tor.