When an organization is dealing with a large number of redundant, obsolete and trivial (ROT) data, the productivity and general output of that organization suffers a blow. The impact of ROT data can be easily addresses if the proper planning and solutions are put in place.
What is ROT Data?
ROT refers to data that is Redundant, Obsolete or Trivial. In other words, it is data that is either no longer relevant (assuming it ever was), or of little to no value to the organization holding on to it.
Redundant data is data that has duplicates stored across multiple locations, perhaps on a different system entirely, perhaps on the system. Intranet systems often contain a large amount of redundant data.
Trivial data is information that isn’t necessary to store. It is data that is providing no value to the organization and could be easily removed without any change to the business.
Obsolete data, as the name suggests, is information that is no longer accurate or no longer in use. It might be outdated information that has been replaced.
Examples of ROT Data
Examples of ROT data include: duplicate email attachments, out-of-date documents, expired server session cookies, and so on. ROT data can be found in many different locations, including desktops, mobile devices, on-premise and cloud servers.
Why ROT Data is a Problem
According to the Veritas Global Databerg Report, 33% of data is ROT, yet many organizations hold on to this data for very long periods of time. Of course, a fear of the delete key is understandable, since the accidental removal of valuable data could have serious financial and legal repercussions. However, holding on to redundant, obsolete or trivial data is not a trivial problem either.
Storage, maintenance and security
Firstly, if approximately 33% of the data you store is ROT, you will find yourself wasting money on both storage and maintenance costs. And bare in mind that if you are taking regular backups of your data, which you should be, you are backing up a lot of junk data too. And what if you want to migrate your data to a remote storage location? Migrating large amounts of ROT data would be an unnecessarily slow and potentially error prone process.
But it’s not just the storage and maintenance costs associated with ROT that we need to be concerned about. It’s widely understood that the more data you store, the harder it is to keep it secure. And this is especially true with modern IT environments which are becoming increasingly more complex and distributed.
When dealing with large sets of unstructured data, it’s a lot harder to determine what data you have, where it is located, and who has (and should have) access to it. It’s takes longer to conduct security assessments and determine the legitimacy of the actions performed on the data. Without visibility, we have no control. And it’s a lot harder to obtain the visibility we need when a third of the data we store is essentially junk.
Productivity, compliance and analysis
Storing large amount of useless data will make it harder for employees to locate the data they need to do their job. Not only will this lead to a loss of productivity, but if employees are not able to access the data they need in fast and efficient manner, this can lead to compliance issues. For example, many data privacy regulations require that organizations deal with subject access requests (SARs) in a timely manner.
In the case of the GDPR, organizations must respond to a SAR within one month of receiving the request, and a failure to do so could result in fines or lawsuits. Many organizations analyse the data they store for a variety of purposes. Having to analyze large amount of ROT data may yield inaccurate results, which will effect the business decisions they make.
How to Better Manage ROT Data
Before you start hitting the delete key, it’s good idea to take a full backup of all of your data, including the ROT. Store this data on an external device or drive to ensure that you can retrieve the data if you make a mistake. Below are some of the most effective methods for clearing out your digital basement.
Perhaps a good place to start would be to identify and remove any duplicate data. There are numerous data deduplication solutions that will automatically scan your repositories for duplicate data, and remove them accordingly. Most solutions work by replacing the duplicate data with a reference that points to the main copy – often referred to as a “single source of truth” (SSOT).
Data deduplication solutions are commonly used for backing up data, as they can remove the duplicate data from a single storage device or filter the duplicate data in real-time as it is being transmitted to an external storage device.
The second step to reducing the amount of ROT data you store would be to find out exactly what data you have and where it is located. The best way to do this is to use an automated data discovery and classification solution. Such solutions will scan your repositories for documents that contain sensitive data, such as personally identifiable information (PII), protected health information (PHI), payment card information (PCI), and more. Most proprietary data classification solutions are also able to classify data at the point of creation or modification.
A typical classification schema will include four categories: public, internal, confidential and restricted. In the context of removing ROT data, all we are really interested in is data that is classified as public, since public data is not sensitive, and thus the removal of such data shouldn’t result in any damage to our organization.
Most Data Security Platforms provide data classification tools out-of-the-box, and will also give you insights into how your data is being accessed. You can filter data based on the date it was last accessed, thus giving you an insight into what data is no longer relevant. A real-time auditing solution will also provide tools for inactive user account management. While inactive user accounts are not technically considered ROT, most inactive user accounts will have ROT data associated with them, and should thus be taken into consideration.
Data retention policies
Of course, the removal of existing ROT data is only a part of the problem. As they say, prevention is a better than a cure. You must ensure that you have policies in place to prevent the unnecessary collection and hoarding of any kind of data.
A data retention policy should include a formalized schedule for scanning repositories and removing ROT data. Ideally, all data that is stored should be assigned a retention period, and there should be some means by which to inform the administrator when the retention period has ended. Some companies use an electronic document management system (EDMS) to automate their retention policies.
An EDMS is typically used by lawyers, accountants, tax professionals, healthcare professionals, and anyone who has a legal responsibility to swiftly and reliably dispose of sensitive data that is no longer relevant to them. Most electronic document management systems will assign a retention period to each document according to the regulations that are relevant to their industry.
Stop the Spread of ROT
Whichever solutions you choose to adopt to help you manage your redundant, obsolete or trivial data, the process of doing so should be ongoing, formalized and scheduled. A failure to keep on top of ROT data, will increase storage and maintenance costs, put your data at risk, make it harder to migrate data, reduce productivity, skew analytical processes and make it harder to comply with the relevant data privacy regulations.
Investing in a Data Security Platform, like Lepide, will enable you to discover, classify, tag and score critical data. Accurate tagging of data helps to automate information management tasks and enables better access governance.