Data masking is essential when working with sensitive data. It is a crucial part of maintaining a secure data environment and avoiding data breaches. While, data masking projects are seemingly simple, they are actually quite tricky. These projects are full of pitfalls and “yes, but” conditions, which aren’t flexible and dynamic enough to keep up with the changing data environment.
I have experienced some of these pitfall first-hand and others are common experiences of data team members who have shared them with me. In this post, I will discuss some of the reasons why data teams struggle with data masking projects.
1. Incomplete or Missing Information
Incomplete or missing information can significantly complicate data masking projects. In many cases, the specifications are not detailed enough, or there are mismatches between the data and the masking procedure.
Data classification is a crucial pre-requisite for data masking. However, if the data is classified incorrectly (for example: when data classification is stale) this creates a noticeable difficulty for the data masking team.
It is further compounded by the case where sensitive information or PII is undefined. The data masking team will need to first search for and then define the sensitive data before it can be anonymized.
2. Chasing Configurations And Enforcement on New Data Projects
A data masking project never ends. Data masking projects are ongoing and dynamic processes that require constant updating to keep up with changing and evolving data. Therefore a major struggle for data teams is to plan for future maintenance, or in the case where new projects are introduced, to incorporate new data as it arises.
3. Multitude of Data Platforms
Often data is stored across platforms. In these cases, the data masking team needs to work with data stored or processed on a multitude of different platforms (for example MySQL, Snowflake, Redshift, or Athena). Since the application of data masking is reliant on the data platform itself and each of these platforms requires different coding, inputs, and technology; this complicates not only the initial data masking procedure but also maintenance in the long run.
4. Semi-Structured Data
Data masking on “regular” columnar data can be complicated enough. Introducing semi-structured data makes this procedure that much more difficult. Semi-structured data requires significantly more detail resulting in increasingly complex and convoluted policies and conditions. Data masking teams will need to sort the semi-structured data to account for the existence (or lack) of sensitive data within semi-structured data, and then apply anonymization accordingly.
5. User Identity Issues
Different users require different anonymization processes. In theory data masking should be differentiated for different users based on their RBAC. However, not only are users often identified in the same way across data technologies. Sometimes these users also share common local user login information. Therefore, data masking needs to be configured as an exception to RBAC, complicating the RBAC design, and the data masking process.
6. Changing Requirements
Everything changes. Compliance and security policies are always changing and updating. Even if you have developed a strong data masking process when one of these policies changes the data masking team will need to first defuse the complicated logic and then determine what specific data types need to be anonymized for specific users.
7. Preparation Time
Time is money. Any delays associated with a data masking project can result in delays for other members of the data team that require access to the data. Further, the longer the data masking project takes, the less available time the data masking team has to work on other projects.
8. Rolling Out
Developing roll-out plans is nontrivial. In many cases rolling out changes as they are “hard programmed” into database views or policies can cause production issues that are both disruptive and difficult to debug. Therefore, data management teams need to create a roll-out plan not only for the project itself, but also for anticipated and unanticipated changes.
9. Requirements Coming From Different Teams
Data masking teams as arbitrators. Often data masking requirements come from different disconnected teams within the organization, such as the DPO (Data Protection Officer), privacy office, product management, data governance teams, data owners, or security teams. In many cases, there may be divergent and even conflicting requirements making it difficult for the data masking team to determine what requirements need to be changed, or ignored, and how to navigate this process within the organization.
Data masking projects and requirements can be tricky, and understanding where these pitfalls are can help you avoid them. When building Satori, the DataSecOps platform, we had these hardships in mind and wanted to solve them. That is why we created a data masking capability that works seamlessly across all your data platforms, is simple to configure, easy to continuously update, and works with our built-in sensitive data discovery capability.
To learn more, visit our data masking page, with highlights and video snippets. If you’d like to speak with one of our experts, and see how Satori can ease data masking projects, book a meeting with one of our experts.