Did Google violate its security obligations by retaining just two weeks of logging data?

Here’s the background in case you missed it: Google+ had a vulnerability which made it possible for third parties to access, via the Google+ API, private Google+ user data between 2015 and March 2018. Google found no evidence that this bug was ever exploited, although it only had two weeks of log data available because of its Google+ logging policy. (WSJ here, NYTimes here).

The Pros (Security) and Cons (Legal Liability) of Logging

Every engineer and product person wants to log everything and keep those logs forever because . . . you never know. They’ll tell you that they might one day run analytics and figure out some new product feature. Or there might be a tricky twice-a-year bug that will require the use of the logging data. Or there might be a data loss bug that the logging data can ameliorate. Etc.

But logging all that data comes at a cost—a legal-liability, breach-notification, and bad-publicity cost. Keeping logs increases the likelihood that a data access vulnerability will morph into a known data breach because, with good logs, you'll be able to tell whether hackers exploited the vulnerability.

That’s why Google’s lawyers told Google+ to retain a trailing two weeks’ worth of logs. (To be fair, I don’t know it was Google’s lawyers but it seems highly likely. Google sucks up all the data in the world and they’re only logging two weeks worth of data!?) And that policy paid off big time here:

Because the company kept a limited set of activity logs, it was unable to determine which users were affected and what types of data may potentially have been improperly collected . . . .

Without knowledge of any affected users, Google was able to tell a favorable story:

We found no evidence that any developer was aware of this bug, or abusing the API, and we found no evidence that any Profile data was misused.“

And Google's comms team had ample ammunition to push into high gear to get a correction to the original story that characterized the vulnerability as a “breach,” which the WSJ quickly did:

Corrections & Amplifications

Google, a unit of Alphabet Inc., exposed the private data of some users of its Google+ social network to outside developers, but the company said it found no evidence that developers misused data. The phrase “data breach” in a headline on an earlier version of this article could be interpreted as suggesting that data were misused. (Oct. 9, 2018)

Google's two-week policy paid off in other ways too. Contrary to the WSJ story, Google is going to avoid all the legal troubles that go along with a breach, like breach reporting and class action lawsuits. Here’s what the WSJ got wrong:

Europe’s General Data Protection Regulation, which went into effect in May of this year, requires companies to notify regulators of breaches within 72 hours, under threat of a maximum fine of 2% of world-wide revenue. The information potentially leaked via Google’s API would constitute personal information under GDPR, but because the problem was discovered in March, it wouldn’t have been covered under the European regulation, Mr. Saikali said.

Google could also face class-action lawsuits over its decision not to disclose the incident, Mr. Saikali said. “The story here that the plaintiffs will tell is that Google knew something here and hid it. That by itself is enough to make the lawyers salivate,” he said.

The GDPR part is wrong because there wasn’t a known breach. See GDPR Art. 33 (“The processor shall notify the controller without undue delay after becoming aware of a personal data breach.”). The class-action part is wrong for the same reason—if standing is tough to get when there is a known breach, see Spokeo, good luck trying to get it here.

By now you should be getting the sense that the Wall Street Journal botched this story. Unfortunately for Google and everyone who reads the news, the “breach” story got replicated across the tech press because most tech news websites (and major newspapers like the NYTimes) do a cut-and-paste job on major stories like this.

The real story is whether, by only retaining two weeks of trailing logging data, Google breached its security obligations.

The one problem with these lawyer’s plan could be Google’s obligation to implement “appropriate technical and organisational measures to ensure a level of security appropriate to the risk . . . .” GDPR Art. 32; see also Google Privacy Policy (“All Google products are built with strong security features that continuously protect your information.”) (emphasis added).

If I were a sophisticated regulator (multiple regulators are investigating, btw), I would challenge Google on whether its two-week-logging-data policy was “appropriate” and “strong.” In its blog post, Google said the rationale for this policy was its desire to build “Google+ with privacy in mind . . . .” But there’s reason to question that given all the data Google retains about its users. I’m not convinced that keeping more than a trailing two weeks of Google+ API logging data would have affected privacy. (Practice point: any privacy impact assessment should assess the marginal impact of a design decision, not the absolute one).

One final point for the Stratechery readers out there. Ben wrote that:

[H]olding onto logs is very risky, both in terms of safety (i.e. they could be exposed) and also liability (i.e. they could prove that data was breached). To that end, given the abject failure of Google+, at least in the consumer space, it makes perfect sense that Google wouldn’t have been logging much of anything.

For a quick and dirty analysis, Ben might be correct. But, as discussed above, the devil is in the details. It may be that safety issues are negligible with retaining logs. It may also be the case that the security risks of a particular surface area, such as an externally available API, are large enough that the product security benefits of retaining logs far outweigh the privacy cost.

Actually, never mind. If you want to minimize your legal risk, then purge your logs.

On balance, the benefits of not knowing whether you’ve been breached far outweigh the nascent risk that a regulator is going to ding you for not keeping sufficient logs, especially if you cloak your log-retention policy under a veil of user privacy protection. The only regulator I would trust right now to be sophisticated enough to ding a company for not retaining sufficient logs is the UK’s ICO, and they have their hands full with other things.

Additional Thoughts and Questions

Does this mean companies need to disclose major vulnerabilities even though they've never been exploited? I don't think so unless you are on the list of hated/feared tech companies.
How was the original WSJ article so bad? You’d think that, at this time when data breaches are so commonplace that we have websites dedicated to checking whether you’ve been owned (you have) and documenting major data breaches, journalists at major newspapers would be able to intelligently report on data breaches. But no.