Organizations have a tendency to hoard large amounts of unstructured data, some of which may contain data that is confidential, such as credit card numbers, passport numbers, health-related information, and so on.
When I say, “unstructured data”, I’m talking about data that doesn’t fit into a traditional relational database, with rows, columns, and keys. Such data might include Word documents, spreadsheets, and emails.
As the number of data breaches continues to rise, and Governments across the globe are introducing tougher mandates relating to the way personal data is protected, it has never been so important for companies to know exactly what sensitive data they have, and where it is located.
Data classification is used to provide structure to unstructured data through the use of labels. It helps organizations locate sensitive data in a fast and efficient manner, ensure that the appropriate controls are in place to prevent unauthorized access to the data, and also helps them comply with the data privacy laws that are relevant to their industry.
As organizations become more reliant on cloud platforms, such as OneDrive, DropBox, and Office 365, the storage of unstructured data has become increasingly more common. This article focuses on the options available to classify data in Office 365, but also includes a brief look at some of the alternative solutions that are available.
Getting Started with Office 365 Data Classification
The first thing you need to do to get started with data classification in Office 365 is to create, configure, and publish labels. The labels will be published alongside a policy that details how documents and emails assigned to a particular label should be treated.
Any employee who handles sensitive data can apply the labels which the administrator has created. The administrator can also specify the conditions in which labels are assigned to data automatically. The labels are only used internally, however, they are recognized by a variety of Microsoft applications, including SharePoint, OneDrive, and Microsoft Teams.
Additionally, they can be used by third-party solutions, such as your data loss prevention software, which can be configured to alert you when the classified data is accessed, moved, modified or deleted. Data classification can also assist with your data retention policy, as there are options that allow you to specify an expiry date for a given label, along with the data that is assigned to that label.
Creating and Publishing Labels
A classification label is essentially a meta-tag that is inserted into documents and emails. To create a label, you will need to go to the Compliance admin center and select Classification > Sensitivity Labels, where you will be asked to enter a label name, tooltip, and description. By default, labels appear in order of Confidential, Internal, and Public. Once you have defined your label, you can;
- Configure access permissions
- Enable encryption for classified data
- Enable endpoint data loss prevention
- Enable Auto Labeling and define the conditions in which the labels will be applied (e.g. if the data contains a passport number or Social Security number).
You also have the option of adding sub-labels for increased granularity. When you have created your label, you will need to publish it, by selecting Publish labels > Choose labels to publish > Add. As mentioned previously, all labels are published along with a default policy, which you can configure according to your needs.
The policy enables you to specify how data classified under a label can be used, and you can also limit who the policy is applied to. In order for the labels to take effect, all users must download the Azure Information Protection unified labeling client, which you can find on Google by typing in AzinfoProtection_UL.exe.
There are two different methods for automatically applying a sensitivity label, which includes Client-side labeling, and Service-side (not server-side) labeling. Essentially, the difference between the two is that Client-side labeling is applied when a user creates or modifies a document in Word, Excel, PowerPoint, or Outlook.
When doing so, the user will be asked to accept or reject the label. Service-side labeling, on the other hand, is applied to content that is already saved and relies on an auto-labeling policy as opposed to human interaction.
The Benefits and Limitations of Automatic Labeling
Automatic labeling can be a very useful feature because you don’t need to rely on your users to classify the documents correctly, instead of allowing them to focus on their work. And of course, it also means that you don’t have to train them.
As mentioned previously, over many years, companies collect large amounts of unstructured data, and many struggle to determine exactly what data they have, and where it is located.
As you can imagine, having to manually sift through and classify vast archives of data would be a daunting and error-prone task, which is why an automated system might be preferable for many organizations. However, it should be noted that automatic labeling has its own shortfalls and limitations. For example, it’s not as precise as manual labeling and can sometimes result in false positives and false negatives.
After all, the software has to scan documents and emails for keywords and try to find data that matches given criteria (often using regular expressions), and the process is far from fool-proof. Additionally, the classification system is only recognized by Office 365 applications (as you would assume).
Third-party Data Classification Solutions
Don’t get me wrong, the Office 365 data classification feature is probably good enough for most organizations. However, organizations that want something more advanced might want to consider adopting a third-party solution.
There are a number of data classification solutions on the market, however, most tend to come packaged with a full-scale DCAP (Data-Centric Auditing & Protection) solution, which provides real-time auditing, threshold alerting, advanced reporting, and many other features.
Most third-party solutions provide support for multiple cloud platforms, in addition to on-premise environments. They can also discover and classify data found in a wide variety of file types, not just Office documents.
Most data classification solutions are able to identify a wide range of data types, such as PII, PCI, PHI, IP, and can be configured to meet the requirements of specific data privacy laws, such as GDPR, HIPAA, SOX, CCPA, and more.