Data Classification: The Why’s and the How’s

by Jason Coggins
05.30.2018   IT Security

Data Classification is simply the process of organizing data based on a set of pre-defined categories. Since organizations have limited resources, it is important for them to know exactly where their most sensitive data is located, in order to be able to allocate those resources in the most effective manner.

One of the issues with data classification is that it’s not always easy to know where to start, especially when you already have vast amounts of uncategorized data scattered around your network. Data classification was traditionally carried out manually; however, these days there are a number tools available which can automate the process. For example, there are tools which can force users to specify the sensitivity of the data at the time of creation, and there are data discovery tools that can locate, classify and report on a wide variety of data including payment card industry data (PCI), protected health information (PHI) and personally identifiable information (PII). Such tools are able to search within images, email servers, databases, cloud storage, SharePoint and more. Additionally, many allow for scheduled searches to be run in the background without disrupting day-to-day operations.

How to Classify Your Data

While there is no one-size-fits-all approach to classifying data, there are generally three key steps which you can take. You can customize these steps to suit your specific requirements.

1. Formulate a Data Classification Plan

The policy should be clear and concise and well communicated to all relevant stakeholders. The policy should include:

  • A clear explanation of the importance of data classification and what you expect to achieve.
  • An outline of the classification process and explanation of how it will affect different employees.
  • A list and explanation of the categories that will be used for classification.
  • An outline of the roles and responsibilities of those handling the data.
  • A clear explanation of how the data in each category should be handled. This includes how the data is stored, processed, retained, shared, encrypted, and who has access rights to the data.

2. Begin Discovering Sensitive Data

If you need to classify a large backlog of data, then you will need a system for discovering that data. As already mentioned, if you don’t want to do this manually, there are a number of third-party solutions which can automatically locate and classify a wide range of sensitive data including PCI, PHI, PII.

3. Assess the Results

After discovering your sensitive data, you will need to analyze the results to ensure that the data is sufficiently protected. You should know which users have access to files and folders containing this data and make sure that it is only those users that truly require it. Excessive permissions to this data is a leading cause of data breaches.

4. Be Proactive and Continuous

Data classification is an ongoing process. Every day files are created, moved, deleted, copied, renamed, etc. If you have chosen to use a data discovery tool, you will likely be able to setup scheduled tasks which can help you keep on top of the classification process.

5. Categorizing the Risk to Your Data

It is up to you how you classify your data; however, it is generally good practice to start with three categories, and then add more, as and when required. Your classification structure could look like this:

  • Low risk – this includes any information that may be disclosed to the public or contains no PII at all.
  • Medium risk – this includes information that may contain snippets of PII (such as a standalone NI number) that is useless on its own but needs to somewhat protected.
  • High risk – highly sensitive data cannot be disclosed to the public for any reason. High risk data may include a name, address and credit card information all in the same file.

Data classification is undoubtedly an important step towards ensuring that your sensitive data is secure, and that you are able to comply with the many data protection regulations. However, data classification is only the first step in securing your data. Data classification helps to answer the question of where the data located. But it doesn’t provide a solution for keeping track of who is accessing what data, and when. LepideAuditor includes an integration with the data discovery and classification tool included in File Server Resource Manager, to help run reports and alerts on sensitive files and folders. You will be able to see who has permissions to this data and whenever changes take place to permissions or to the data itself, to help improve your overall data access governance.


Lepide® is a Registered Trademarks of Lepide Software Private Limited. © Copyright 2018 Lepide Software Private Limited. All Trademarks Acknowledged.