Content delivery network Cloudflare has revealed that it recently fixed a serious software bug in its infrastructure that may have led to the exposure of cookies, passwords and user authentication tokens.
Data has been leaking since September last year and some of it may still be around on the Web. Cloudflare also provides Internet security and distributed domain name server services and hosts something in the region of six million websites.
The bug was found by Google researcher Tavis Ormandy who wrote, in part, "It looked like that if an HTML page hosted behind Cloudflare had a specific combination of unbalanced tags, the proxy would intersperse pages of uninitialised memory into the output...
"My working theory was that this was related to their "ScrapeShield" feature which parses and obfuscates html - but because reverse proxies are shared between customers, it would affect *all* Cloudflare customers."
{loadposition sam08}Ormandy notified Cloudflare of the issue on 18 February. By 21 February, the company had fixed it and re-enabled automatic HTTPS rewrites, server-side excludes and email obfuscation. But cached data could still be around on the Web.
The data leak was caused by a bug in an HTML parser chain that is used to change Web pages as they progress through Cloudflare's edge servers.
Cloudflare chief technology officer John Graham-Cumming said insertion of the Google Analytics tag, rewriting of http:// to https://, excluding portions of a page from bad bots, obfuscating email addresses, and enabling Accelerated Mobile Pages were some of the changes made in Web pages as they passed through the edge servers.
He said the company had started using a new parser, cf-html, about a year ago, as it worked well with HTML5 and was faster and easier to maintain than the old one which was written using Ragel.
Both the parsers are NGINX modules which are compiled into Cloudflare's NGINX builds. NGINX is a Web server which can also be used as reverse proxy, load balancer and Web cache.
"These NGINX filter modules parse buffers (blocks of memory) containing HTML responses, make modifications as necessary, and pass the buffers onto the next filter," Graham-Cumming said, adding that the bug was not in Ragel itself, but rather in Cloudflare's use of Ragel.
"It turned out that the underlying bug that caused the memory leak had been present in our Ragel-based parser for many years but no memory was leaked because of the way the internal NGINX buffers were used," he said.
"Introducing cf-html subtly changed the buffering which enabled the leakage even though there were no problems in cf-html itself."
After Ormandy's write-up was published, some posters questioned him as to why, at the time of this bug, many people had to re-authenticate their Google accounts on their devices.
One asked: "Tavis, did any of this somehow cause the Google Account 'Action Required' alerts today? My Google G Suite audit logs show no security changes or activity at all, which is why I'm left speculating on the cause. The alert and account verification seems to have only happened on my mobile devices (not sure I understand the logic on why my desktop session wasn't terminated).
"This might all be a coincidence, but I figured I should confirm regardless. (Can't think of any reason why Cloudflare would have access to sensitive Google tokens, unless there's some impact in Google AMP ads via Firebolt I'm missing.)
Ormandy denied there was any connection, but after the third such query, he suddenly said: "I couldn't tell you, this is not the correct place to ask. This is not a discussion forum, I'm closing additional comments."