Who's Asking? Defining “Personal Data” Under the GDPR.

tl;dr Because the GDPR only applies to personal data, you need to know whether something is or isn't. The GDPR doesn’t make this easy. The crux of the problem is this: whether data is "personal data" depends entirely on who’s asking—that is, whether the person in question has sufficient information to link the data to a real live person.

Applying this rule:

In the context of a data subject access request, everything you know about a person is personal data.
In the context of a data deletion request, if you delete enough information such that you can’t tie the data you're keeping to a particular person, then you don’t need to delete the data you're keeping.
In the context of a data breach, whether you need to report the breach depends on whether the public can tie the breached data to a particular person.
Similarly, In the context of a data processing addendum, whether you need a DPA depends on whether the processor can tie the information to a particular person.

"Is it personal data?” is the threshold question.

All 100+ pages of the GDPR only apply to "personal data.” As a result, if you can disqualify information as personal data, then your life gets a lot simpler (for example, you don’t have a to sign a DPA, disclose it if it gets breached, etc.). It’s like the time I didn't have a date for the prom—that solved all the complications of needing to rent a tux, get a limo, or attending . . . pretty great.

The definition of Personal Data

There are two parts to the GDPR’s definition of personal data (you can skip over this because I am going to simplify it in a second):

“[A]n identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person”; and
“’personal data' means any information relating to an identified or identifiable natural person ('data subject')[.]”

Article 4(1).

This definition is readily translatable into a two-step “is it personal data” test:

Does the actor have enough information to identify a person?
If yes, is this data about that person?

If the answer to these two questions is “yes,” then we have personal data.

You'll note that the identity of the actor makes all the difference: If the actor is, for example, a company gathering information about its users, then everything it knows about its users is personal data. On the other hand, if data is leaked outside of the company and the “actor” is a random person, then it may not be.

To make this crystal clear, we’ll explore various scenarios in which the “is it personal data” question arises.

Context Matters (Part 1): Should I include data in a personal data report?

Suppose Acme Co. has a Favorite Color database table as follows:

UserID	Favorite Color
1	Blue
2	Pink
etc.	etc.

And assume that we have a User table as follows:

UserID	Name
1	Robin Moore
2	Roger Rabbit
etc.	etc.

If a user asks for her personal data report, does the company need to provide the contents of Favorite Color? The answer is yes and emerges from a straightforward application of the two-part test:

Can we identify the person? Yes, because we can cross-reference the UserID to identify a person.
Is this data about a person? Yep.

Context Matters (Part 2): Do I need to delete `Favorite Color` if I’ve deleted the `User` table?

No, you don’t. The GDPR only applies to personal data, so the answer again comes out of our application of the two-part test:

Can we identify the person? No, because without the User table, we can’t associate someone’s Favorite Color with an identifiable person.
Is this data about a person? Yes, but we can’t identify the person, so it doesn’t matter.

Context Matters (Part 3): Do I need to report a breach if `Favorite Color` gets leaked publicly?

This is a more interesting question.

Can we identify the person? Maybe . . .

If you go back and carefully read the GDPR’s identifiability test, you’ll notice that it doesn’t specify who is doing the identifying. That is, the GDPR states that “an identifiable natural person is one who can be identified . . . .” but leaves open the question of the person doing the identifying.

This matters:

If third parties (like the public) are doing the identifying, then Favorite Color is not personal data, it’s anonymous data. That’s because there’s no way for the public to go from “brown” to “Robin Moore” and so the “data subject is not or no longer identifiable.”
However, if literally everyone on earth is the identifier, then Favorite Color is personal data because someone at Acme Co. could cross-reference the UserID against the User table and identify me.

Fortunately, the Court of Justice of the European Union in Patrick Breyer v. Germany (2016) answered this question and the answer is . . . third parties. As one law firm eloquently described in a case update (emphasis added):

[A] piece of information will not be personal data in the hands of a party that has no legal means of obtaining sufficient additional data to make such a link.

2. Is this data about a person? Yes, for the same reasons in the Personal Data Report analysis above.

As an aside, there’s even more legal cover, courtesy of the Article 29 Working Party guidance. That guidance states that notification is not required if “a breach is unlikely to result in a risk to the rights and freedoms of individuals.” Therefore, in this instance, no reporting would be required.

Context Matters (Part 4): Do I need a data processing addendum (DPA) if I ask a third-party to process `Favorite Color`?

Suppose you’re going to send just Favorite Color to a vendor to translate it into a physical address. Are the Favorite Color personal data under the GDPR such that they require a data processing addendum?

The analysis is identical to the one in Part 3 but, given how DPAs have become ordinary course, it’s painless to put a DPA in place.

Another twist on the “identifiable person” question—how much information do we need about someone before they are identifiable?

The short answer is “very little.” Famous studies show that an “anonymous” database of Netflix ratings, place + gender + data of birth, someone’s Internet search history, and someone’s static IP address are all sufficient to identify a person.

Recital 26 provides the applicable GDPR test: “To determine whether a natural person is identifiable, account should be taken of all the means reasonably likely to be used, such as singling out, either by the controller or by another person to identify the natural person directly or indirectly.”

To continue with our example, suppose our Favorite Color table looked like:

UserID	Favorite Color	Date
1	Blue	June 7, 2019
1	Pink	June 9, 2019
etc.	etc.	etc.

Is it reasonably likely that someone could take the above example and identify some of the people in it? Per the examples above, with enough data of this type, it may be possible to link this data to a particular person. As a result, it could, by itself, count as personal data.

Bonus: What about the California Consumer Privacy Act’s definition of personal data? Is it much different?

The CCPA’s definition of personal data (see below) is broad but, when combined with its definition of de-identified data, should lead to similar results as the GDPR analysis. That said, the devil is in the details so it would require a separate analysis.

"Is it personal data?” is the threshold question.

The definition of Personal Data

Context Matters (Part 1): Should I include data in a personal data report?

Context Matters (Part 2): Do I need to delete Favorite Color if I’ve deleted the User table?

Context Matters (Part 3): Do I need to report a breach if Favorite Color gets leaked publicly?

Context Matters (Part 4): Do I need a data processing addendum (DPA) if I ask a third-party to process Favorite Color?

Another twist on the “identifiable person” question—how much information do we need about someone before they are identifiable?

Bonus: What about the California Consumer Privacy Act’s definition of personal data? Is it much different?

Read More:

Context Matters (Part 2): Do I need to delete `Favorite Color` if I’ve deleted the `User` table?

Context Matters (Part 3): Do I need to report a breach if `Favorite Color` gets leaked publicly?

Context Matters (Part 4): Do I need a data processing addendum (DPA) if I ask a third-party to process `Favorite Color`?