The Trouble with CloudFlare

Wednesday, CloudFlare blogged that 94% of the requests it sees from Tor are "malicious." We find that unlikely, and we've asked CloudFlare to provide justification to back up this claim. We suspect this figure is based on a flawed methodology by which CloudFlare labels all traffic from an IP address that has ever sent spam as "malicious." Tor IP addresses are conduits for millions of people who are then blocked from reaching websites under CloudFlare's system.

We're interested in hearing CloudFlare's explanation of how they arrived at the 94% figure and why they choose to block so much legitimate Tor traffic. While we wait to hear from CloudFlare, here's what we know:

1) CloudFlare uses an IP reputation system to assign scores to IP addresses that generate malicious traffic. In their blog post, they mentioned obtaining data from Project Honey Pot, in addition to their own systems. Project Honey Pot has an IP reputation system that causes IP addresses to be labeled as "malicious" if they ever send spam to a select set of diagnostic machines that are not normally in use. CloudFlare has not described the nature of the IP reputation systems they use in any detail.

2) External research has found that CloudFlare blocks at least 80% of Tor IP addresses, and this number has been steadily increasing over time.

3) That same study found that it typically took 30 days for an event to happen that caused a Tor IP address to acquire a bad reputation and become blocked, but once it happens, innocent users continued to be punished for it for the duration of the study.

4) That study also showed a disturbing increase over time in how many IP addresses CloudFlare blocked without removal. CloudFlare's approach to blocking abusive traffic is incurring a large amount of false positives in the form of impeding normal traffic, thereby damaging the experience of many innocent Tor and non-Tor Internet users, as well as impacting the revenue streams of CloudFlare's own customers by causing frustrated or blocked users to go elsewhere.

5) A report by CloudFlare competitor Akamai found that the percentage of legitimate e-commerce traffic originating from Tor IP addresses is nearly identical to that originating from the Internet at large. (Specifically, Akamai found that the "conversion rate" of Tor IP addresses clicking on ads and performing commercial activity was "virtually equal" to that of non-Tor IP addresses).

CloudFlare disagrees with our use of the word "block" when describing its treatment of Tor traffic, but that's exactly what their system ultimately does in many cases. Users are either blocked outright with CAPTCHA server failure messages, or prevented from reaching websites with a long (and sometimes endless) loop of CAPTCHAs, many of which require the user to understand English in order to solve correctly. For users in developing nations who pay for Internet service by the minute, the problem is even worse as the CAPTCHAs load slowly and users may have to solve dozens each day with no guarantee of reaching a particular site. Rather than waste their limited Internet time, such users will either navigate away, or choose not to use Tor and put themselves at risk.

Also see our new fact sheet about CloudFlare and Tor: https://people.torproject.org/~lunar/20160331-CloudFlare_Fact_Sheet.pdf

Anonymous

April 01, 2016

Permalink

Capchas are also a barrier to blind and visually impaired users accessing the free internet.

Anonymous

April 01, 2016

Permalink

The real trouble with CloudFlare and friends is of course that they are Man-in-the-Middle-as-a-service. That people find such an invasion on the integrity of the Internet acceptable is beyond my comprehension.

i agree. outside the bitcoin community nobody seems to care much, and even there some exchanges use cloudflare.
how can it even be legal for such services to give away their private key to some third party?

I'm pretty sure CF only acts as a proxy. That is, they don't know the site's private key and they only forward encrypted traffic. Although the comment below about "Flexible SSL" is worrisome.

You are wrong. CF always terminates your SSL, so they are a perfect man in the middle. It merely has the option to re-encrypt it when being forwarded to your actual site.

But 99% of the traffic on the Internet is run through "man in the middle services" including 99.99% of servers not hosted at the website owners ip. So services like CF being beyond your comprehension is understandable.

What do you mean by "man in the middle services"? I have a hard time believing CF and Akamai are /that/ popular. Look at sites like torproject.org, Gnu.org, Wikipedia.org, etc. Sure, this probably has to do with the audience of these sites, but 99.99% seems a little steep. Very many mom and pop websites use CF because they are small and they need it for protection, but if we are looking just at the most popular sites out there, I suspect that percentage would be substantially lower.

When you say 99% of Internet traffic, are you talking about traffic byte for byte, packet for packet, number of TCP connections, HTTP requests? This is an important distinction since sites like youtube and netflix are probably the biggest byte for byte, but they are already built with big pipes for high load and don't really need the likes of CF. But if you mean 99% of HTTP requests, that's slightly more believable since that's what CF is designed for and most commonly used with.

Bottom line: do you have a link to the source of these statistics?

Yes, I do agree with that; they claim they don't snoop TLS, but they offer a very dangerous service called "Flexible SSL" which terminates a TLS connection at the CloudFlare node, but then passes on the data from the node to the hidden server cleartext. Perhaps the CA/B forum should investigate whether or not that is a legitimate service and instruct their member CAs as to whether or not to continue issuing certificates blindly to their services.

That feature was used by some people to make their static Github blogs use HTTPS on custom domains. In the end, it covers part of the distance and although not perfect, maybe doesn't deserve such distaste. With Let's Encrypt now that automatic HTTPS is possible, custom domain HTTPS can be offered as part of any hosting service without full server privilege, deprecating Flexible SSL.

OTOH, a properly set up TLS session cannot be inspected without some exploits, such as NSA/CIA's one on D-H key exchange, which is not economically viable for a CDN to execute.

People have been trained (conditioned) to trust any higher authority in the form of an organization rather than trust each other. As long as this conditioning prevails expect things to drastically turn to worse. If we organize horizontally and from below without guardians and protectors and learn to trust our organization against those from above we may then begin to see the light.

Cloudflare is a business and counts on the majority as customers/individuals.

I have had bad experiences with certain websites that Cloudflare is suppose to protect, such as getting scammed out of my money. And, when I've tried to track down who the host is for that website, I find that is non-other than Cloudflare themselves who just claim to be a security protocol for the real host of the scam-site. But, I believe Cloudflare themselves are the real crooks, and are indeed the host. Websites such as bitcoincloudservices.com continue to remain up without ever getting taken down.

Yes - just like all Internet Service Providers and Tor itself. Every single carrier of internet traffic is a "man in the middle as a service".

Anonymous

April 01, 2016

Permalink

"Users are either blocked outright with CAPTCHA server failure messages, or prevented from reaching websites with a long (and sometimes endless) loop of CAPTCHAs.."

Lately VPN traffic is also subjected to similar CAPTCHA harassment. I doubt website owners understand the extent of legitimate traffic they lose and/or frustrate by using Cloudfare's services.

I agree, I've start giving up because the CAPTCHAs are just pissing me off. I'm brain-damaged, so I don't have a CS degree but even my dumb self knows that an algorithm that includes the logic: IF [IP matches TOR blacklist] AND [traffic pattern matches known attack pattern] THEN [offer aggressive CAPTCHA] + IF [pattern is repeated] THEN [add IP to blocklist].
Something like that. It's not rocket science, and Cloudflare are REALLY weak to not implement a more intelligent setup. Or they're actively trying to harvest data on behalf of the usual Big Brother powers that be, using their market share to change culture.

The case against captcha has has to go legal. My time is valuable. Captcha has to stop, Its censorship. It also prevents people from the truth. I know Cloudfare knows i am no robot. One answer is enough. Its discriminatory.

Anonymous

April 01, 2016

Permalink

I generally like cloudflare - they serve a useful purpose, but damn - they are really hostile towards Tor users. I generally try to avoid sites which use cloudflare because of this, luckily not all websites are using the service.

Well done Akamai though.

The fact that you haven't heard of them suggests that they are doing their job (i.e. content distribution without interfering with the user experience of ordinary internet users) properly.

To the best of my knowledge, Facebook has become so huge they decided to operate their own CDN.
As an added bonus, their self-hosted approach ensures they can run their site from a .onion domain.

Akamai has been around since 1998, and they don't put themselves in the news for tunnelling dodgy sites (they have an acceptable use policy), in blogs for bad HTTPS (secure sites have dedicated IPs by the way), and now on the Tor blog for blocking Tor.
With about 18 years of experience, they likely have seen and deflected just about every attack out there, and have been around through the full evolution of Tor as well as other proxy services, likely putting in significant engineering effort to maintain compatibility with these proxies while not compromising protection against threats.

Akamai was founded by Daniel "Danny" Mark Lewin was an American-Israeli mathematician and entrepreneur who co-founded internet company Akamai Technologies. He died in the 911 attacks (suspicious). At the very top level of Akamai is the "Akamai Web Intelligence" that does on their network what NSA does on every other network.

Anonymous

April 01, 2016

Permalink

The problem is not in cloudflare but in website owners. Most of website owners do not welcome tor users because if tor user hacked his site the site owner wouldn't be able to prosecute hacker. If a tor user posted illegal content noticed by authorities, they will go for the website owner. If the website owner is unable to help authorities to identify the poster he is liable instead of the poster.
(If tor exit node owner is unable to help the authorities to identify the tor user they want he is liable instead of the poster.)

Because it is law enforcement, if the crime is detected, someone must be prosecuted. If noone is prosecuted, it will destroy the atmosphere governments are creating in order to control population, which means the cop must be fined or fired and a more professional cop must be hired instead.

> The problem is not in cloudflare but in website owners. Most of website owners do not welcome tor users because if tor user hacked his site the site owner wouldn't be able to prosecute hacker.

Are you talking about a "private prosecution" (legal in some countries), or did you mean, the website owner asks police agencies to investigate, or asks government prosecutors to bring criminal charges?

> If a tor user posted illegal content noticed by authorities, they will go for the website owner. If the website owner is unable to help authorities to identify the poster he is liable instead of the poster.

In US law (which is important internationally since it tends to set the standard for international investigations), traditionally web site operators were immunized from that hazard, but this protection is under continuing threat.

> (If tor exit node owner is unable to help the authorities to identify the tor user they want he is liable instead of the poster.)

Again, my understanding is that so far this is generally not quite true for US/EU operators of Tor nodes, but I'd be happy to hear comments from TP.

> Because it is law enforcement, if the crime is detected, someone must be prosecuted. If noone is prosecuted, it will destroy the atmosphere governments are creating in order to control population, which means the cop must be fined or fired and a more professional cop must be hired instead.

I have never heard of cops being fined simply for failing to make an arrest. Quite the opposite: in the US, cops routinely get away with murder (literally--- that is what the BLM movement is all about.)

"Quite the opposite: in the US, cops routinely get away with murder..."

Or act like real criminals with seizing your private property -in amounting to billions- without any real charge.

In Germany there is no liability for operators of tor exit nodes but sometimes there are searches and confiscations (sometimes for months) without compensation. It's quite deterring for people running an exit-node from home (I know of a case where every computer in a household was confiscated and returned only after months and without compensation). Also the police is allowed to use evidence of completely unrelated "crimes" (from copyright to owning cannabis) found in such searches.

You never used cloudflare, but I use it... you (the site owner) can not disable that captcha for TOR users! Their is no option for that.

Even if you set the firewall protection to what they call "essentially off" it still demands captchas from TOR users. I know, I test it.

Many people like me use cloudflare, because I want to protect the real hosting provider from attacks on the web site (cloudflare is the only one they can find in the whois & DNS information)... because if someone attacks the hosting company and they think it is because of my web site, they will immediately put me out of there... unless you have millions of dollars or euros... when they may open a big $$$ €€€ exception for you, as long as you spend it like there is no tomorrow. And also helps a little in protecting against attacks on the hosting company control management to get to you (since they don't know who is, they can't attack it).

Anonymous

April 01, 2016

Permalink

Thanks for responding.

I hope CloudFlare customers know the damage done to them. I know I shudder at the sight of medium.com links as I recall the frustration caused by CloudFlare. It takes me 0 minues to read their posts now.

Anonymous

April 01, 2016

Permalink

Even if they reached their 94% by unique GET or POST requests, it is still a flawed statistic. Someone running a security scan on a host might generate 50k requests in a few hours and to compare those requests to normal requests would be ridiculous. But that is what I believe Cloudfare is doing to come up with their numbers.

Dropping the bad reputation for Tor nodes quicker after any such bad activity has stopped does not appear to be happening either. The bad rep is too sticky.

Anonymous

April 01, 2016

Permalink

It might be that cloudflares malicious statement is exagerated, but I can see how if you count million of request from bots compared to humans it will come close.

Anyway, since I like both your initiatives it is sad to see this battle starting.

Please try to be constructive in finding solutions because a life depending website that is down because of a ddos is equally bad as one that is down because of captcha madness.

How about owners of such sites start serving multiple instances with and without protection?

Ken

>a life depending website that is down because of a ddos
If you think you can effectively use Tor for DDoS, you are very, I'm gonna be polite, naive.

The Tor network is not a DoS threat for any website.

Anonymous

April 01, 2016

Permalink

Ive run into the multiple captchas problem which has appeared recently on localbitcoins where you have to run through a few captchas to access the site then another few to access the login page, on average it takes about 10 minutes just to login after having to start over and over again with a fresh identity due to captcha server errors, and if you walk away from the computer for more than 5 minutes it makes you do another set (i think this last one is Tor's fault, exit ip's are supposed to be fixed per site/session but i see them still constantly change).

>i think this last one is Tor's fault, exit ip's are supposed to be fixed per site/session but i see them still constantly change

No. This is by design. Circuits are switched after some time (currently it's 10 minutes by default).

Read the documentation.

Anonymous

April 01, 2016

Permalink

Once an IP address has emitted abusive traffic, how is Cloudflare supposed to know that the address has stopped emitting abusive traffic? It's not like you can police your network and disconnect the abuser because they're anonymous, so the assumption must be that the abuser is still present. Faced with that assumption, I don't really see Cloudflare's actions as being wrong. It's simply a case of you wanting to protect your network at the expense of their network and them wanting to protect their network at the expense of your network, both aims being fundamentally incompatible.

> It's simply a case of you wanting to protect your network at the expense of their network and them wanting to protect their network at the expense of your network, both aims being fundamentally incompatible.

This is a false dilemma. We've been talking to other DDoS and website protection services in the market, and none of them blanket block Tor in perpetuity. Many of CloudFlare's competitors have sophisticated WAFs (Web Application Firewalls) or IDSs (Intrusion Detection Systems), as well as conventional spam filters that process incoming traffic to filter out malicious traffic in realtime, only while it is ongoing. Even when broad-scale scans and DDoSs require blanket bans, those companies' systems lift the ban as soon as the attack traffic subsides. They do this specifically to avoid collateral damage from infections, botnets, and IP spoofing attacks, as well as to avoid blocking users behind large-scale shared IP networks, VPNs, and Tor.

The real problem with CloudFlare in one sentence is the perma-bans and the collateral damage this causes. See also http://paulgraham.com/spamhausblacklist.html for information on how the long-term blacklist approach played out with email in the past.

We've been asking CloudFlare competitors to come forward about how they handle Tor traffic, but one of the problems is that no one wants to discuss their "secret sauce" and risk competitors catching up.

The part of a WAF secret sauce that deals appropriately with Tor is straightforward to talk about: label Tor requests to origin in the same way you label German or Chinese or NIPRnet requests. Be more sophisticated in applying rules---for example, if you have a WAF attack detector that labels each request with a score from 0 to 1, you might want to say "block on 0.8, warn on 0.5; but if it's from China or from Tor, block on 0.5". Those are still deterministic fast rules, so cheap enough for Bot mitigation.

I'm not a big fan of WAFs as a product category---but if you are going to have one, it's a funny threat model that leads to blocking requests whose responses will be highly cachable. A GET forward to the origin, sure---but if you're serving from cache and setting long TTLs for the browser cache, or even just marking it Public---what's the point of blocking that? I hear "deterring vulnerability scanning," and I don't get it.

I sort of understand for ecommerce scraper not handling, but that's not meaningfully correlated with Tor---and anyway, you want to handle that at layer 8 or higher by serving interesting prices.

Anonymous

April 01, 2016

Permalink

Tor user for almost a decade here. I've been using tor exclusively for a majority of that time. I have no reason to give my physical location to each server I contact. For me it looks like this:

before CloudFlare (a few years ago): almost every website works on tor

after CloudFlare: almost no website works on tor

From what I've seen, the entire debate so far is bikeshed, including the CloudFlare blogpost, which is the pinnacle of bikeshed.

Correct me if I'm wrong but the reason people use CloudFlare is because it's either bundled in their web hosting package, or because they want CDN/anti-DDOS. None of the above require a captcha gate. Anti-DDOS already existed before and such services simply eat up as much bandwidth as possible. CloudFlare *still* has to do this. The captcha gate changes no aspect of that.

The problem here seems to be that CloudFlare bundles in some sort of IDS/IPS system. As they admit, the captcha is not part of the anti-DDOS. Instead, the captcha is pupportedly there for a bunch of reasons, but in reality all it can do is mitigate bot activity. An attacker doing SQL injection on a website will *not* be stopped by a captcha gate or even the flat out blocking of any IP detected as malicious. I thought the industry already figured this out in the 90's or early 2000's. Then again, HN and the California software developer crowd love to reinvent things.

Their claim is:

> A large percentage of the comment spam, vulnerability scanning, ad click fraud, content scraping, and login scanning comes via the Tor network.

In other words, the captcha gate does nothing other than reduce the number of bot requests. Scraping, scanning, and spam are still possible, but for the ones that CloudFlare can detect, they are blocked, and thus they have something to sell to their clients. The idea of stopping bots from crawling your page and harvesting emails is laughable. Sure since CloudFlare control most of the web, in total it may even half the amount of spam I get, but I'm *still* getting spam. Someone will paste my email on some page that's accessible to a bot. Bots routinely harvest emails from malware. For me it makes no difference.

However, CloudFlare is selling a magical security device. The client thinks it's making their website more secure, when in reality at most it's simply reducing spam to unrelated people. Don't treat me like a 5 year old and tell me it's stopping my content from being scraped. There are two separate concepts here:

1. A bot from a well known blacklisted IP scraping millions of pages from different websites. It will just hit the captcha gate and its effectiveness reduced. If such bot was harvesting email addresses, then yes, some unrelated people will not be spammed as much.
2. Someone scraping your site to get your content. He's going to bypass CloudFlare no matter what. He can just buy an IP address for a few dollars and scrape from there. If CloudFlare does any sort of human activity verification (e.g, monitoring page load rate, measuring mouse movement, verifiying the browser), it can be bypassed through trial and error, or simply by distributing the scrape across IPs. Such is what you've signed up for when you published your content to the public internet. If anyone tells you they have a solution for this, they are lying.

Basically, CloudFlare sell some popular services, and as a Value Add, there is this dubious feature which ruins tor, and it's on by default. The only reason people use this is because either they're sold on the idea of a magic security enhancing device, or because it's just on by default and they aren't aware of it and the consequences. It's very clear that CloudFlare is only caring about their own interests. Since a big set of their customers are HN users, they have to answer to their dilittante concerns about tor. That's the only reason their blog post exists.

And it's only going to get worse. Since client behavior analyzing gates like CloudFlare and recaptcha are trending, pretty soon they will be writing browser authenticity checks which rely on *exact timings* and other browser-specific behavior to authenticate you to view a website. It will no longer be possible to create an open source browser without getting it adopted by major players. You'll just have to emulate Firefox or Chrome.

> Tor user for almost a decade here. I've been using tor exclusively for a majority of that time.

Likewise.

> I have no reason to give my physical location to each server I contact.

I put it like this: I feel I have good reason to avoid giving up geolocation and other abusable information.

> For me it looks like this:
> before CloudFlare (a few years ago): almost every website works on tor
> after CloudFlare: almost no website works on tor

Not quite as bad for me, but I also simply stopped visiting sites which require CloudFlare captchas.

> Someone scraping your site to get your content. He's going to bypass CloudFlare no matter what. He can just buy an IP address for a few dollars and scrape from there.

Just wanted to point out that US DOD (Dept of Defense) and LEO (law enforcement organization) agencies also scrape content (that's what "social media monitoring" is all about). USIC even breaks into social media servers to grab private information of users, particularly on-forum chats and messages. And LEOs hire private companies to do likewise. Years ago Nielsen company was notorious for aggressive scraping of private messages from web forums which appeared to the forum operators to resemble hacking (in that Nielsen appeared to exploit zero day flaws to grab huge amounts of nonpublic information). More recently, Nielsen seems to engaged in "internet use surveys" without disclosing that they have been hired by USG agencies (USMS? USSS? FBI?) to target rather specific populations with an "innocuous" survey.

BTW, CloudFlare adversely reduces privacy even if you just want to browse a single website. On sites without CloudFlare, you can view each document with a unique identity (no cookies, cache, js, etc. browse each page with a random exit node). With CloudFlare, you'd have to solve a captcha for each document. The only way around this without bypassing the captcha somehow is to save all the documents on the website under one identity, and browse the few you care about offline. Ironically, I've had no trouble doing this for sets of 100-1000 documents. Meanwhile, my most commonly used website, Wikipedia, doesn't have the CloudFlare gate so far, so I don't have such a problem with them.

Totally agree. Cloudflare is a purveyor of fine snake-oil. No wonder they don't really respond to criticism. Pretending to listen to take the edge off criticism, stalling for time, sidetracking (just have a look at the bugtracker ticket!), hiding behind smokescreens (clouds?), while claiming to be on the good side, yes. But no effort to really address the damage they are inflicting on the Tor project. Of course not. The king of MitM pseudo-security services is naked.

That last point you raise is a very dangerous development. Forget about browsers. Forget about the web. Behavioral profiling will increasingly be seen as something normal. Sensors everywhere, and if you don't conform to model, WHAM: malicious. Locked out. No more house keys. No more tickets. No more passwords. Google will know it's them. A sheeple's dream, a misfit's nightmare.