What is Data Masking? Types, Techniques and Best Practices

Aidan Simister | 9 min read| Published On - July 27, 2023

Data Masking

With data breaches becoming ever more prevalent and causing many companies to lose huge sums of money, data protection has become a top priority. Along with this, security compliance and data privacy regulations have become increasingly stringent to maintain the integrity and availability of regulated data. Data masking has therefore become an essential technique in meeting fundamental security requirements that businesses need to protect their sensitive data.

What is Data Masking?

Data Masking is a process in which the original values of the production data are changed while keeping the format of the data the same. The modifications may take place through encryption, character shuffling, or substitution and the aim is to create a version of the data that cannot be deciphered or reverse engineered.

The goal of data masking is to protect sensitive data, while providing a practical alternative when real data is not needed. For example, in user training, sales demos, or software testing.

Data privacy legislation such as GDPR promotes data masking so that businesses use private data as little as possible. The average cost of a data breach is $4 Million, which gives companies a strong incentive to invest in information security solutions such as data masking, which can be cheaper to implement than other encryption solutions.

Why is Data Masking Important?

As data privacy and security regulations become increasingly important, data masking now plays a major role in meeting essential security requirements.

Here are several reasons why data masking has become a key requirement for many organizations:

Mitigates critical threats like data loss, exfiltration, insider threats, and compromised accounts.
Reduces data risks associated with cloud adoption.
Makes data useless to an attacker, while maintaining many of its fundamental functional properties.
Allows sharing data with authorized users, such as testers and developers, without exposing sensitive production data.
Can be used for data sanitization. While normal file deletion still leaves traces of data in storage media, sanitization replaces old values with masked ones.
Data masking is a requirement for many data security regulations. Even if there’s no data incident at your organization, it is essential to remain compliant with regulations.

Data Masking Types

Static Data Masking (SDM): Static Data Masking involves the data being masked in the database before being copied to a test environment so the test data can be moved into untrusted environments or third-party vendors.

In Place Masking: In Place masking involves reading from a target and then overwriting any sensitive information with masked data.

On the Fly Masking: On the fly masking is reading data from a location — such as production — and writing the masked data onto a non-production target.

Static Data Masking: The masking of data in storage removes any traces like logs or changes in data captures. This helps by removing static data left behind from interactions with storage.

Dynamic Data Masking: Dynamic data masking helps prevent unauthorized access to sensitive data by revealing only a part of the sensitive data. Dynamic data masking happens at runtime, dynamically, and on-demand so that there doesn’t need to be a second data source to store the masked data. It’s a policy-based security feature that hides the sensitive data, for example, in the result set of a query over designated database fields, while the data in the database isn’t changed.

A typical scenario could be where a service representative at a call center might identify a caller by confirming several characters of their email address, but the complete email address shouldn’t be revealed to the service representative.

Synthetic Data Generation: Instead of masking data, this approach generates new data in place of existing data, preserving the data structure. It’s used for scenarios like greenfield application development where software systems are built for a totally new environment.

Data Masking Techniques

Data Masking can be done in multiple ways, which include:

Encryption

Encryption is the most complex and most secure type of data masking. You use an encryption algorithm that masks the data and requires a key (encryption key) to decrypt the data.

Encryption is suited to production data that needs to return to its original state. The data will be safe as long as only authorized users have the key. If any unauthorized party gains access to the key, they can decrypt the data so careful management of the encryption key is crucial.

Substitution

Substitution is masking the data by substituting it with another value. This is one of the most effective data masking methods that preserve the original look like the feel of the data.

The substitution technique can apply to several types of data. For example, masking customer names with a random lookup file. This can be pretty difficult to execute, but it is a very effective way of protecting data from breaches.

Organizations substitute the original data with random data from supplied or custom lookup file. This is an efficient and effective way to disguise data since businesses preserve the data’s integrity and structural format.

Shuffling

Shuffling is similar to substitution, but it uses the same individual masking data column for shuffling in a randomized fashion.

For instance, shuffling employee names columns across multiple employee records. The output data looks like accurate data but doesn’t reveal any actual personal information. However, if anyone gets to know the shuffling algorithm, shuffled data is prone to reverse engineering.

In shuffling, organizations substitute the original data with another authentic-looking data, but the same column’s entities are shuffled. The value can move vertically or randomly along the columns.

Character Scrambling

Scrambling is a basic masking technique that jumbles the characters and numbers into a random order hiding the original content. This process is irreversible, so the original data cannot be obtained from the scrambled data.

Tokenization

Tokenization is a reversible process where the data is substituted with random placeholder values. Tokenization can be implemented with a vault or without, depending on the use case and the cost involved with each solution.

Blurring

The value stored in the database is altered with a defined range of values available.

Best Practices for Data Masking

Identify sensitive data: The first step in implementing data masking is to identify the sensitive data elements within the organization’s systems. This includes personally identifiable information (PII) such as names, addresses, social security numbers, and financial information. Understanding the scope and location of sensitive data helps prioritize the data masking efforts.
Develop a data masking strategy: Organizations should develop a comprehensive data masking strategy that defines the objectives, scope, and approach for data masking. This strategy should consider the specific requirements and regulations applicable to the organization’s industry, such as GDPR or HIPAA, to ensure compliance.
Use a variety of masking techniques: Depending on the nature of the data and the intended use of the masked data, different masking techniques can be employed. Common techniques include substitution (replacing original values with fictional values), shuffling (randomly reordering characters or digits), and encryption (using cryptographic algorithms to transform data). Employing a combination of masking techniques enhances the security and realism of the masked data.
Maintain referential integrity: When masking data, it is crucial to maintain referential integrity to ensure the consistency and usability of the masked data. This involves preserving relationships between different data elements, such as foreign key constraints, to avoid breaking application functionality or causing data corruption.
Test and validate the masking process: Before implementing data masking in a production environment, thorough testing and validation should be conducted. This helps ensure that the masking techniques applied produce the desired results without introducing any data inconsistencies or anomalies. It is essential to involve stakeholders from different teams, such as development, testing, and compliance, to verify the effectiveness of the masking process.
Implement role-based access controls: To further enhance data security, organizations should implement role-based access controls (RBAC) for masked data. RBAC ensures that only authorized individuals or roles can access specific masked data sets. This helps prevent unauthorized access to sensitive information, even within the organization.
Regularly review and update masking policies: Data masking is an ongoing process that requires regular review and updates. As new data types or regulations emerge, organizations should reassess their data masking policies and adapt them accordingly. Regular audits and assessments of the data masking implementation help identify any gaps or vulnerabilities that need to be addressed.
Monitor and log data access: Implementing robust monitoring and logging mechanisms is critical for tracking data access and detecting any suspicious activities. By monitoring access logs, organizations can identify potential security breaches or unauthorized access attempts promptly. It is essential to establish clear protocols for investigating and responding to any security incidents.

How Lepide Helps with Data Security

The Lepide Data Security Platform can help you improve your data security strategy by aggregating and summarizing event data from multiple sources which can include both on-premise and cloud platforms. All important events are displayed on a single, centralized dashboard, with various options for sorting and searching. Below are some of the most notable features of the Lepide Data Security Platform:

Data classification: The Lepide data classification tool will scan your repositories, both on-premise and in the cloud, and classify sensitive data as it is found. You can also customize the search according to the compliance requirements relevant to your business.

Machine learning: Lepide uses machine learning algorithms to establish usage patterns that can be tested against to identify anomalous behavior.

Change auditing and reporting: Lepide’s change auditing and reporting tool enables you to keep track of how your privileged accounts are being accessed and used. Likewise, any time your sensitive data is accessed, shared, moved, modified, or deleted in an atypical manner, a real-time alert can be sent to your inbox or mobile device. Alternatively, you can simply review a summary of changes via the dashboard.

Threshold alerting: Lepide’s threshold alerting feature enables you to detect and respond to events that match a pre-defined threshold condition.

Inactive user account management: Lepide can help you locate any inactive, or “ghost” user accounts, thus preventing attackers from exploiting them.

If you’d like to see how the Lepide Data Security Platform can help give you more visibility over your sensitive data and protect you from security threats, schedule a demo with one of our engineers or start your free trial today.

Aidan Simister

Having worked in the IT industry for a little over 22 years in various capacities, Aidan is a veteran in the field. Specifically, Aidan knows how to build global teams for security and compliance vendors, often from a standing start. After joining Lepide in 2015, Aidan has helped contribute to the accelerated growth in the US and European markets.