As organizations collect and store increasingly more information, and as IT environments become increasingly more complex and distributed, the solutions for managing dark data are becoming increasingly more relevant. According to a recent report, on average, a whopping 55% of an organization’s data is “dark”.
Assuming this figure is accurate, it is understandable why many security teams and business leaders are keen to address this issue.
What is Dark Data
In simple terms, dark data is data that organizations don’t know they have. It is usually collected in much the same way as any other type of data, but for various reasons, it either goes unnoticed or gets forgotten about. Gartner has defined dark data as “the information assets organizations collect, process, and store during regular business activities, but generally fail to use for other purposes.”
Since dark data is unknown, it is naturally unprotected, and is thus a huge security risk. For all we know, the dark data we store may have already been breached at some point. Perhaps it has already been used or will be used, for a variety of nefarious purposes.
Dark Data Challenges
There’s no reason to assume that dark data doesn’t contain sensitive personal information. If it does, then you probably have a legal obligation to find it, and implement the necessary measures to keep it secure. We must remember that adversaries are usually ahead of the game when it comes to identifying vulnerabilities in an organization’s security posture. They will understand that if data is not known, it is not protected, and thus they can steal the data without getting noticed.
Another issue associated with dark data relates to business analytics. Companies spend large amounts of money collecting and analyzing data, often via market research campaigns, which they can use to give them a competitive advantage. However, if over half of the data they store is not known about, they are potentially missing out on a good opportunity to maximize their profits by deriving insights from this data.
Types Of Dark Data
As mentioned, dark data is much the same as any other type of data. It can be structured, unstructured or semi-structured. The data could contain business-critical data, or it could be data that is defined as ROT (Redundant, Obsolete, and Trivial). Examples of dark data include;
- Log files from servers, applications, network devices, etc.
- Geolocation data
- Emails and attachments
- Data that belongs to former employees
- Spreadsheets and presentations
- Financial statements
- Footage from CCTV cameras
- Customer call records, etc.
How To Find Dark Data
The first thing you will need to do to find the dark data is located any data that contains sensitive, regulated data. This involves using a dark data discovery and classification solution, which will scan your file repositories, whether on-premise or “in the cloud”, and automatically classify the data as it is found.
It’s also a good idea to use a solution that will classify data at the point of creation/modification. Data is typically classified as either public, private, or restricted, although you can use other categories if necessary.
Depending on how sophisticated your solution is, this may be enough to locate and classify all dark data that exists within your IT environment.
If you have the time/resources, it is a good idea to manually check that all dark data has been discovered. You will also need to scan your repositories for data that is either duplicate, or no longer relevant, and either archive the data or simply remove it from your system.
You may also want to consider looking at your IT budget to see if your storage costs are escalating in a particular area. If they are, then this might suggest that you are collecting data that you are not aware of.
Once you are confident that you have identified all assets across all repositories, how you choose to deal with any dark data you find is up to you. You may want to keep it and assign access controls to it – perhaps for the purpose of business analytics, or simply delete it for good measure.