What is Data Classification? How Data Classification is Done

Jason Coggins by   05.30.2018   Data Security

Data Classification is simply the process of organizing data based on a set of pre-defined categories. Since organizations have limited resources, it is important for them to know exactly where their most sensitive data is located, in order to be able to allocate those resources in the most effective manner.

What is Data Classification?

Data classification can be loosely defined as organizing data into categories based on the content so that access rights can be appropriately assigned, and security can be focussed. Data classification makes data easier to locate and retrieve. This can be useful in cases where subjects exercise their right to be forgotten, for example. Data classification involves tagging data so that it can be easily searched for and monitored. Data classification may sound like a technical topic, but the applications should be understood at all levels of the business.

The Benefits of Data Classification

In general terms, there are three key benefits to perfecting your data classification strategy:

  • Meeting Compliance Demands: One of the most popular reasons why organizations look to implement data classification, is to help ensure they can meet regulatory compliance requirements. Most compliance requirements mandate that data is searchable and retrievable within very tight deadlines.
  • Improving Data Security: Data classification is also highly useful when it comes to protecting your sensitive data. The first step in a data-centric approach to security is to ensure you know where your most sensitive data is located and the reason why it is deemed to be sensitive. Once you know this, you will be in a better place to decide what access rights to apply on the data, which users actually need access, and be able to focus your user behavior analytics on the data and users that matter most.
  • Understanding a Breach: Should the worst happen, and you suffer a data breach, data classification will help you to determine the extent of the damage, what data was lost and help guide who to inform.

The Difficulty with Data Classification

One of the issues with data classification is that it’s not always easy to know where to start, especially when you already have vast amounts of uncategorized data scattered around your network. Data classification was traditionally carried out manually; however, these days there are number of tools available which can automate the process. For example, there are tools which can force users to specify the sensitivity of the data at the time of creation, and there are data discovery tools that can locate, classify and report on a wide variety of data including payment card industry data (PCI), protected health information (PHI) and personally identifiable information (PII). Such tools are able to search within images, email servers, databases, cloud storage, SharePoint and more. Additionally, many allow for scheduled searches to be run in the background without disrupting day-to-day operations.

Examples of Data Classification

There are commonly shown to be four classification levels that you can adhere to when it comes to classifying your data – and they are presented in order of most to least risk. We will briefly mention some examples here to give you an overview.

  • Restricted: this data poses the biggest risk to your organization and must be kept secure. Loss or theft of this data could cause significant harm to your business and to the individuals affected and incur criminal or legal liability. An example of this would be credit card numbers.
  • High Risk: This is data that, if exposed, could cause harm to the business and the individuals affected, including the potential for legal action. It is data usually covered under compliance regulations, including protected health information and personally identifiable information.
  • Medium Risk: If this data is exposed it could cause limited harm to individuals and to the business, such as business contracts with third-parties, intellectual property and employee records.
  • Low Risk: Public information that would not cause any harm to individuals or the business if exposed. Such data includes anything that is in the public sphere, such as published research.

How to Classify Your Data

While there is no one-size-fits-all approach to classifying data, there are generally three key steps which you can take. You can customize these steps to suit your specific requirements.

Step 1 – Formulate a Data Classification Plan

The policy should be clear and concise and well communicated to all relevant stakeholders. The policy should include:

  • A clear explanation of the importance of data classification and what you expect to achieve.
  • An outline of the classification process and explanation of how it will affect different employees.
  • A list and explanation of the categories that will be used for classification.
  • An outline of the roles and responsibilities of those handling the data.
  • A clear explanation of how the data in each category should be handled. This includes how the data is stored, processed, retained, shared, encrypted, and who has access rights to the data.

Step 2 – Begin Discovering Sensitive Data

If you need to classify a large backlog of data, then you will need a system for discovering that data. As already mentioned, if you don’t want to do this manually, there are a number of third-party solutions which can automatically locate and classify a wide range of sensitive data including PCI, PHI, PII.

Step 3 – Assess the Results

After discovering your sensitive data, you will need to analyze the results to ensure that the data is sufficiently protected. You should know which users have access to files and folders containing this data and make sure that it is only those users that truly require it. Excessive permissions to this data could be a leading cause of data breaches.

Step 4 – Be Proactive and Continuous

Data classification is an ongoing process. Every day files are created, moved, deleted, copied, renamed, etc. If you have chosen to use a data discovery tool, you will likely be able to setup scheduled tasks which can help you keep on top of the classification process.

Step 5 – Categorizing the Risk to Your Data

It is up to you how you classify your data; however, it is generally good practice to start with three categories, and then add more, as and when required. Your classification structure could look like this:

  • Low risk – this includes any information that may be disclosed to the public or contains no PII at all.
  • Medium risk – this includes information that may contain snippets of PII (such as a standalone NI number) that is useless on its own but needs to somewhat protected.
  • High risk – highly sensitive data cannot be disclosed to the public for any reason. High risk data may include a name, address and credit card information all in the same file.

Data classification is undoubtedly an important step towards ensuring that your sensitive data is secure, and that you are able to comply with the many data protection regulations. However, data classification is only the first step in securing your data. Data classification helps to answer the question of where the data located. But it doesn’t provide a solution for keeping track of who is accessing what data, and when.

LepideAuditor provides automated data discovery and classification along with reports and alerts on sensitive files and folders. You will be able to see who has permissions to this data and whenever changes take place to permissions or to the data itself, to help improve your overall data access governance.

If you would like to see how LepideAuditor can help you improve data classification and data security, schedule a demo of LepideAuditor today.