New low cost traffic analysis attacks and mitigations

Recently, Tobias Pulls and Rasmus Dahlberg published a paper entitled Website Fingerprinting with Website Oracles.

"Website fingerprinting" is a category of attack where an adversary observes a user's encrypted data traffic, and uses traffic timing and quantity to guess what website that user is visiting. In this attack, the adversary has a database of web pages, and regularly downloads all of them in order to record their traffic timing and quantity characteristics, for comparison against encrypted traffic, to find potential target matches.

Practical website traffic fingerprinting attacks against the live Tor network have been limited by the sheer quantity and variety of all kinds (and combinations) of traffic that the Tor network carries. The paper reviews some of these practical difficulties in sections 2.4 and 7.3.

However, if specific types of traffic can be isolated, such as through onion service circuit setup fingerprinting, the attack seems more practical. This is why we recently deployed cover traffic to obscure client side onion service circuit setup.

To address the problem of practicality against the entire Internet, this paper uses various kinds of public Internet infrastructure as side channels to narrow the set of websites and website visit times that an adversary has to consider. This allows the attacker to add confidence to their classifier's guesses, and rule out false positives, for low cost. The paper calls these side channels "Website Oracles".

As this table illustrates, several of these Website Oracles are low-cost/low-effort and have high coverage. We're particularly concerned with DNS, Real Time Bidding, and OCSP.

All of these oracles matter to varying degrees for non-Tor Internet users too, particularly in instances of centralized plaintext services. Because both DNS and OCSP are in plaintext, and because it is common practice for DNS to be centralized to public resolvers, and because OCSP queries are already centralized to the browser CAs, DNS and OCSP are good collection points to get website visit activity for large numbers of Internet users, not just Tor users.

Real Time Bidding ad networks are also a vector that Mozilla and EFF should be concerned about for non-Tor users, as they leak even more information about non-Tor users to ad network customers. Advertisers need not even pay anything or serve any ads to get information about all users who visit all sites that use the RTB ad network. On these bidding networks, visitor information is freely handed out to help ad buyers decide which users/visits they want to serve ads to. Nothing prevents advertisers from retaining this information for their own purposes, which also enables them to mount attacks, such as the one Tobias and Rasmus studied.

In terms of mitigating the use of these vectors in attacks against Tor, here's our recommendations for various groups in our community:

  • Users: Do multiple things at once with your Tor client
  • Because Tor uses encrypted TLS connections to carry multiple circuits, an adversary that externally observes Tor client traffic to a Tor Guard node will have a significantly harder time performing classification if that Tor client is doing multiple things at the same time. This was studied in section 6.3 of this paper by Tao Wang and Ian Goldberg. A similar argument can be made for mixing your client traffic with your own Tor Relay or Tor Bridge that you run, but that is very tricky to do correctly for it to actually help.

  • Exit relay Operators: Run a local resolver; stay up to date with Tor releases
  • Exit relay operators should follow our recommendations for DNS. Specificially: avoid public DNS resolvers like 1.1.1.1 and 8.8.8.8 as they can be easily monitored and have unknown/unverifiable log retention policies. This also means don't use public centralized DNS-Over-HTTPS resolvers, either (sadly). Additionally, we will be working on improvements to the DNS cache in Tor via ticket 32678. When those improvements are implemented, DNS caching on your local resolver should be disabled, in favor of Tor's DNS cache.

  • Mozilla/EFF/AdBlocker makers: Investigate Real Time Bidding ad networks
  • The ability of customers of Real Time Bidding ad networks to get so much information about website visit activity of regular users without even paying to run ads should be a concern of all Internet users, not just Tor users. Some Real Time Bidding networks perform some data minimization and blinding, but it is not clear which ones do this, and to what degree. Any that perform insufficient data minimization should be shamed and added to bad actor block lists. For us, anything that informs all bidders that a visit is from Tor *before* they win the bid (e.g., by giving out distinct browser fingerprints that can be tied to Tor Browser or IP addresses that can be associated with exit relays) is leaking too much information.

    The Tor Project would participate in an adblocker campaign that specifically targets bad actors such as cryptominers, fingerprinters, and Real Time Bidding ad networks that perform little or no data minimization to bidders. We will not deploy general purpose ad blocking, though. Even for obvious ad networks that set visible cookies, coverage is 80% at best and often much lower. We need to specifically target widely-used Real Time Bidding ad networks for this to be effective.

  • Website Operators: Use v3 Onion Services
  • If you run a sensitive website, hosting it as a v3 onion service is your best option. v2 onion services have their own Website Oracle that was mitigated by the v3 design. If you must also maintain a clear web presence, staple OCSP, avoid Real Time Bidding ad networks, and avoid using large-scale CDNs with log retention policies that you do not directly control. For all services and third party content elements on your site, you should ensure there is no IP address retention, and no high-resolution timing information retention (log timestamps should be truncated at the minute, hour, or day; which level depends on your visitor frequency).

  • Researchers: Study Cover Traffic Defenses
  • We welcome and encourage research into cover traffic defenses for the general problem of Website Traffic Fingerprinting. We encourage researchers to review the circuit padding framework documentation and use it to develop novel defenses that can be easily deployed in Tor.

Ferri

December 20, 2019

Permalink

To Developers,

all users using tor must have a bot which opens random sites, so bot makes it harder for government to find the request was sent by a human or bot

I guess it would be fairly easy to add some such capability to the start_tor_browser script. The question is whether it would actually help keep users safer, or would just clog the Tor network without actually helping most users.

Another idea the be awesome with bot is to do it at random time that does just that in background. Make it act interactive on every website. Have a feature with internet bandwidth requirement for bot that make it look like your watching a video or downloading a file. Set them to close at random time or when there a lag as well.

Ferri

December 20, 2019

Permalink

Is bitmessage secure with tor with OnionTrafficOnly?
How to create good noise traffic in tor with OnionTrafficOnly?

Ferri

December 21, 2019

Permalink

avoid using large-scale CDNs with log retention policies that you do not directly control

Tor Browser should consider including Decentraleyes. And also just want to note that adblocker coverage (f.ex. with uBlock Origin) is better than it was in 2012.

Ferri

December 21, 2019

Permalink

This post is devastatingly sophisticated. I love it. More like this, please. But not a plurality; the blog has to be approachable for general audiences.

I too loved the post and share everyone's gratitude for Mike Perry's work for TP, especially his role in developing Tor Browser, the Best-Thing-Yet for ordinary citizens. I'd also love to see more posts covering technical issues (the state of crypto in Tor vis a via Quantum Cryptanalysis, for example), but agree that we need a mix of posts readable by prospective new users and prospective new Tor node operators, which ideally will inspire people considering trying Tor to follow through by adopting Tor for daily use.

Ferri

December 22, 2019

Permalink

Lots of work has gone into padding established connections, but what about an observer forcibly interrupting traffic to record timing or errors of disconnections? It occurred to me as I thought about how an external observer might be able to locate a Tor user who logs into ProtonMail, Tutanota, or a chat service and sends messages, for instance. I've lived in places where a momentary cut to the internet regularly correlated before hearing a police siren or helicopter.

Join the discussion...

We encourage respectful, on-topic comments. Comments that violate our Code of Conduct will be deleted. Off-topic comments may be deleted at the discretion of the post moderator. Please do not comment as a way to receive support or report bugs on a post unrelated to a release. If you are looking for support, please see our support portal or ways to get in touch with us.

This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.

18 + 2 =
Solve this simple math problem and enter the result. E.g. for 1+3, enter 4.