Data Classification is simply the process of organizing data based on a set of pre-defined categories. Since organizations have limited resources, it is important for them to know exactly where their most sensitive data is located, in order to be able to allocate those resources in the most effective manner.
What is Data Classification?
Data classification can be loosely defined as organizing data into categories based on the content so that access rights can be appropriately assigned, and security can be focussed. Data classification makes data easier to locate and retrieve. This can be useful in cases where subjects exercise their right to be forgotten, for example. Data classification involves tagging data so that it can be easily searched for and monitored. Data classification may sound like a technical topic, but the applications should be understood at all levels of the business.
The Benefits of Data Classification
In general terms, there are three key benefits to perfecting your data classification strategy:
- Meeting Compliance Demands: One of the most popular reasons why organizations look to implement data classification, is to help ensure they can meet regulatory compliance requirements. Most compliance requirements mandate that data is searchable and retrievable within very tight deadlines.
- Improving Data Security: Data classification is also highly useful when it comes to protecting your sensitive data. The first step in a data-centric approach to security is to ensure you know where your most sensitive data is located and the reason why it is deemed to be sensitive. Once you know this, you will be in a better place to decide what access rights to apply on the data, which users actually need access, and be able to focus your user behavior analytics on the data and users that matter most.
- Understanding a Breach: Should the worst happen, and you suffer a data breach, data classification will help you to determine the extent of the damage, what data was lost and help guide who to inform.
The Difficulty with Data Classification
One of the issues with data classification is that it’s not always easy to know where to start, especially when you already have vast amounts of uncategorized data scattered around your network. Data classification was traditionally carried out manually; however, these days there are number of tools available which can automate the process. For example, there are tools which can force users to specify the sensitivity of the data at the time of creation, and there are data discovery tools that can locate, classify and report on a wide variety of data including payment card industry data (PCI), protected health information (PHI) and personally identifiable information (PII). Such tools are able to search within images, email servers, databases, cloud storage, SharePoint and more. Additionally, many allow for scheduled searches to be run in the background without disrupting day-to-day operations.
Types of Data Classification
Naturally, different types of data require different levels of classification. The most widely adopted classification schema includes four categories: public, internal, confidential and restricted.
Public data: This includes any information which can be exposed to public without posing a threat to the company. Such data might include names, job descriptions, or newsletters. Since public data is essentially unrestricted, it is the least relevant of the four classification levels.
Internal data: This type of data includes any information that is restricted to internal members of staff and relevant stakeholders only. Examples of internal data might include memos, plans, charts, presentations, and so on. Were this type of data to be exposed to the public, it’s unlikely that the company would be subject to any regulatory fines or lawsuits; however, it could be problematic if the leaked data contained business secrets that could be used by their competitors.
Confidential data: This type of data includes any information which, if leaked to the public, could result in lawsuits, fines and reputational damage – leading to a loss of business. Examples of confidential data include: Social Security numbers, banks details, protected health information, and any data that is covered by the data privacy laws relevant to your organization.
Restricted data: This type of data includes any information which, if breached in some way, could do considerable harm to the organization and the data subjects involved. In addition to potentially costly lawsuits, fines and damage to the organization’s reputation, a breach involving restricted data can endanger peoples lives. For example, if a healthcare organization were to suffer a breach where the attacker was able to gain access to privileged account credentials, they could install and run a ransomware program which will shut practitioners out of critical systems.
The “principal of least privilege” must be strictly adhered to when handling internal, confidential and restricted data. It should also be noted that the categories mentioned above can contain sub-categories, which can provide more specificity relating to how the data is accessed, how long the data should be retained, and so on.
How to Classify Your Data
While there is no one-size-fits-all approach to classifying data, there are generally three key steps which you can take. You can customize these steps to suit your specific requirements.
Step 1 – Formulate a Data Classification Plan
The policy should be clear and concise and well communicated to all relevant stakeholders. The policy should include:
- A clear explanation of the importance of data classification and what you expect to achieve.
- An outline of the classification process and explanation of how it will affect different employees.
- A list and explanation of the categories that will be used for classification.
- An outline of the roles and responsibilities of those handling the data.
- A clear explanation of how the data in each category should be handled. This includes how the data is stored, processed, retained, shared, encrypted, and who has access rights to the data.
Step 2 – Begin Discovering Sensitive Data
If you need to classify a large backlog of data, then you will need a system for discovering that data. As already mentioned, if you don’t want to do this manually, there are a number of third-party solutions which can automatically locate and classify a wide range of sensitive data including PCI, PHI, PII.
Step 3 – Assess the Results
After discovering your sensitive data, you will need to analyze the results to ensure that the data is sufficiently protected. You should know which users have access to files and folders containing this data and make sure that it is only those users that truly require it. Excessive permissions to this data could be a leading cause of data breaches.
Step 4 – Be Proactive and Continuous
Data classification is an ongoing process. Every day files are created, moved, deleted, copied, renamed, etc. If you have chosen to use a data discovery tool, you will likely be able to setup scheduled tasks which can help you keep on top of the classification process.
Step 5 – Categorizing the Risk to Your Data
It is up to you how you classify your data; however, it is generally good practice to start with three categories, and then add more, as and when required. Your classification structure could look like this:
- Low risk – this includes any information that may be disclosed to the public or contains no PII at all.
- Medium risk – this includes information that may contain snippets of PII (such as a standalone NI number) that is useless on its own but needs to somewhat protected.
- High risk – highly sensitive data cannot be disclosed to the public for any reason. High risk data may include a name, address and credit card information all in the same file.
Data classification is undoubtedly an important step towards ensuring that your sensitive data is secure, and that you are able to comply with the many data protection regulations. However, data classification is only the first step in securing your data. Data classification helps to answer the question of where the data located. But it doesn’t provide a solution for keeping track of who is accessing what data, and when.
How Lepide Helps with Data Classification
Lepide Data Security Platform comes with an advanced data classification solution which will help you protect your critical assets, eliminate false positives and meet the relevant compliance requirements. It works by incrementally scanning your repositories for sensitive data, and automatically classifying the data according to your chosen schema. Likewise, Lepide Data Security Platform can classify sensitive data at the point of creation or modification. It covers a wide range of data types, including Personally Identifiable Information (PII), Protected Health Information (PHI), Payment Card Information (PCI), and more. You can customize the search criteria in accordance with the data privacy regulations that apply to your industry, including GDPR, HIPAA, PCI-DSS, FISMA, SOX and more.