Personally Identifiable Information (PII) is commonly defined as “any data that could potentially identify a specific individual”, and can be either sensitive or non-sensitive. Sensitive PII is information which, when disclosed to an unauthorised entity, could result in harm to the data subject. Disclosure of non-sensitive PII, on the other hand, will result in little to no harm to the data subject.
What Qualifies as PII?
PII includes names, addresses, emails, birthdates, medical records, credit card numbers, financial statements, passport numbers, social security numbers, driver’s licenses’, and vehicle plate numbers. It also includes biometric data, such as handwriting, fingerprints, and photographs of the data subject.
While it may not be possible to identify someone just by their date-of-birth, PII can be combined with other types of information to create a “pseudo” or “quasi” identifier. For example, according to the following research, “87% of the U.S. Population are uniquely identified by {date of birth, gender, ZIP}”. While not a mandatory requirement, the pseudonymization of such information is highly recommended by the GDPR.
What Classification Schema Should Be Used For PII?
As mentioned, PII can be sensitive or non-sensitive. However, having only two categories may not be granular enough for organization who store large amounts of data. Most large organizations would require at least three categories, which would include: Public, Private and Restricted. Below is a brief summary of each category.
1. Public Data: This is data that is in the “public domain”, and may include data found in newspapers, public records, telephone directories, business directories, social media platforms and websites. While anyone can access this data, a business holding this data may want to apply access controls to prevent unauthorized modification or destruction of this data.
2. Private Data: This is data which the data subject may not wish to disclose, such as their date-of-birth, home address and phone number. If a hacker really wanted to find this type of information, they probably could without resorting to cyber-crime. As such, private data should be covered by a moderate level of protection.
3. Restricted Data: This is highly confidential data which hackers want but cannot obtain through legitimate means. Restricted data includes social security numbers, credit card details, medical information, and so on. As you would expect, restricted data should be covered by the highest level of security controls.
Collections of data should be classified in accordance with the most sensitive data in the set. For example, a set of data may contain a name, address, and social security number. Given that social security numbers are often used for identity fraud, the set should be classified as Restricted.
Tips for Classifying and Protecting PII
Firstly, data protection regulations such as PCI DSS, ISO 27000 and GDPR provide comprehensive frameworks for protecting PII. There’s also a wide variety of technologies like Lepide Data Classification that can automatically classify PII, prevent unencrypted PII leaving the network, and analyze user behavior affecting PII. Unlike non-sensitive data, sensitive data must be encrypted, both at rest and in transit. As stated above, pseudonymization is also highly recommended. Fortunately, there are tools available which can discover, encrypt and/or redact PII either at the point of creation, or following a scan.
Protecting PII is no different from protecting other types of sensitive data. Organizations should adhere to the “principal of least privilege” at all times to ensure that stakeholders only have access to the data they need to carry out their role, and they should adopt a real-time change auditing solution to ensure that they have complete visibility into how their PII is being accessed and used.