The Trouble with CloudFlare
Wednesday, CloudFlare blogged that 94% of the requests it sees from Tor are "malicious." We find that unlikely, and we've asked CloudFlare to provide justification to back up this claim. We suspect this figure is based on a flawed methodology by which CloudFlare labels all traffic from an IP address that has ever sent spam as "malicious." Tor IP addresses are conduits for millions of people who are then blocked from reaching websites under CloudFlare's system.
We're interested in hearing CloudFlare's explanation of how they arrived at the 94% figure and why they choose to block so much legitimate Tor traffic. While we wait to hear from CloudFlare, here's what we know:
1) CloudFlare uses an IP reputation system to assign scores to IP addresses that generate malicious traffic. In their blog post, they mentioned obtaining data from Project Honey Pot, in addition to their own systems. Project Honey Pot has an IP reputation system that causes IP addresses to be labeled as "malicious" if they ever send spam to a select set of diagnostic machines that are not normally in use. CloudFlare has not described the nature of the IP reputation systems they use in any detail.
2) External research has found that CloudFlare blocks at least 80% of Tor IP addresses, and this number has been steadily increasing over time.
3) That same study found that it typically took 30 days for an event to happen that caused a Tor IP address to acquire a bad reputation and become blocked, but once it happens, innocent users continued to be punished for it for the duration of the study.
4) That study also showed a disturbing increase over time in how many IP addresses CloudFlare blocked without removal. CloudFlare's approach to blocking abusive traffic is incurring a large amount of false positives in the form of impeding normal traffic, thereby damaging the experience of many innocent Tor and non-Tor Internet users, as well as impacting the revenue streams of CloudFlare's own customers by causing frustrated or blocked users to go elsewhere.
5) A report by CloudFlare competitor Akamai found that the percentage of legitimate e-commerce traffic originating from Tor IP addresses is nearly identical to that originating from the Internet at large. (Specifically, Akamai found that the "conversion rate" of Tor IP addresses clicking on ads and performing commercial activity was "virtually equal" to that of non-Tor IP addresses).
CloudFlare disagrees with our use of the word "block" when describing its treatment of Tor traffic, but that's exactly what their system ultimately does in many cases. Users are either blocked outright with CAPTCHA server failure messages, or prevented from reaching websites with a long (and sometimes endless) loop of CAPTCHAs, many of which require the user to understand English in order to solve correctly. For users in developing nations who pay for Internet service by the minute, the problem is even worse as the CAPTCHAs load slowly and users may have to solve dozens each day with no guarantee of reaching a particular site. Rather than waste their limited Internet time, such users will either navigate away, or choose not to use Tor and put themselves at risk.
Also see our new fact sheet about CloudFlare and Tor: https://people.torproject.org/~lunar/20160331-CloudFlare_Fact_Sheet.pdf
let's just call them blockflare :3
> 5) A report by CloudFlare competitor Akamai found that the
> percentage of legitimate e-commerce traffic originating from
> Tor IP addresses is nearly identical to that originating from
> the Internet at large. (Specifically, Akamai found that the
> "conversion rate" of Tor IP addresses clicking on ads and
> performing commercial activity was "virtually equal" to that
> of non-Tor IP addresses).
A specious claim? Let's see...
Cherry picks supporting claims? Check
Quotes source deceptively? Check
Draws on points of limited relevance to make a case? Check
Relies on reputation of source for validity? Check
That report states unequivocally "Tor exit nodes were far more likely to contain malicious requests"
(I interpret this as meaning "[Traffic from] Tor exit nodes [was] far more likely to contain malicious requests" or equivalently "Tor exit nodes were far more likely to [send] malicious requests")
From the report...
Tor IPs: 1.26% of malicious traffic, 0.04% of legit traffic
Other IPs: 98.74% of malicious traffic, 99.96% of legit traffic
What was similar between Tor and non-Tor traffic, according to the report, was the distribution of attack types among the malicious traffic observed. This similarity is relative, not absolute, and does not contradict the statement "Tor exit nodes were far more likely to contain malicious requests".
The positive-sounding "conversion rate" is cherry-picked, but what does this mean? Conversions on the internet are typically low (<5%). Speculating now: Perhaps legit Tor users are actually *more* likely to convert than non-Tor legit users. If (speculating, remember) legit Tor users are twice as likely to convert, it would require half the Tor traffic to be malicious for these numbers to add up.
But who was actually talking about the conversion rate? No one. We were talking about whether bad actors as well as good use Tor, and whether there is increased risk to content providers from Tor traffic.
Is quoting a report that states "Tor exit nodes were far more likely to contain malicious requests" to support the claim that traffic Tor nodes are not more likely to send malicious requests valid? No (for all values of No).
I concede the point to those who've made it that labelling traffic legitimate or malicious has some devilish details - I hope that _that_ discussion can be considered outside the scope of my simple point: the claim to which I was responding was made without adequate attention to truth.
The morale of the story:
Sometimes the first step in dealing with a problem is admitting that you have a problem.