Data Classification,

Data Masking,

Satori

Using Satori to Keep Your JSON Data Safe

|Solution Architect

Companies are using more and more semi-structured formats such as JSON. The main reasons are the ability to be more flexible about some of the data that is present only in certain cases, or changes in the schema of the data that is being stored. Many data stores support storage and querying of semi-structured data, but handling semi-structured data also adds security and governance challenges.

In particular, it can be difficult to keep up with the discovery, classification, and masking of semi-structured data, particularly sensitive data. Not only does this task require a significant amount of engineering resources, if it’s not done continuously it can result in security incidents. 

In this article, we’ll explore JSON and how to use Satori with your JSON data to keep your data secure. 

What is JSON?

JavaScript Object Notation (or JSON, pronounced “Jason”, for short) started as a data interchange format between web apps and servers, known for its human-readable text and open standards. It was created as a communication protocol that didn’t rely on Flash or Java. Despite being based on a subset of JavaScript, it’s language-independent and easily parsed with other languages and platforms. Some databases have created native functionality that helps parse JSON-written documents and commands. 

JSON  has become a popular alternative to XML due to its native support with JavaScript and JS-based applications, flexibility with other coding languages, human-readability, as well as versatility in various applications. 

Get the latest from Satori

Why is Semi-Structured Data a Concern?

Structured data follows a standard format of rows and columns. While structured data has a fixed schema and is easily stored and queried, in today’s world this type of data only comprises a small percentage of data currently collected. Instead, the text-based data from apps, mobile devices and IoTs has led to a  plethora of semi-structured JSON data. 

The concern is that within this flood of semi-structured data are sensitive data, such as PII and PHI. Since the data does not follow a fixed schema, which gives it tremendous flexibility, it also makes it significantly more difficult to discover and mask sensitive data. 

Read more about Why Data Classification Projects Are So Hard! and the 9 Common Struggles Data Teams Face With Data Masking Projects.

How Satori Ensures Data Security with JSON Data

The Satori data security platform allows organizations to continuously discover sensitive data and automatically and manually mask sensitive JSON data across databases, data warehouses, and data lakes. 

1. Continuously classify data. Satori provides continuous and systematic data classification to ensure that any sensitive data is discovered. The benefit of this method is that there is no break in the scanning for sensitive data. When a manual method is used to discover sensitive semi-structured data, the discovery is halted between scans which can take a significant amount of time depending on the available resources of the data responsible for this task.

Therefore, if you have data that is constantly and rapidly changing, Satori ensures that sensitive semi-structured JSON data is continuously discovered. 

2. Dynamic masking. Satori supports dynamic masking of JSON data out of the box without having to write additional code or make changes to your databases, warehouses or lakes. Satori allows the user to anonymize data quickly based on your security requirements and policies, user, roles, attributes or a combination. 

Therefore, if you have sensitive JSON data, Satori will first discover this sensitive information and then anonymize the data according to the user or their roles or attributes.

An Example of How Satori Continuously Discovers & Masks Sensitive JSON Data

An Example of Automatically and Manually Discovering & Masking Sensitive JSON Data in Postgres

Given a table of data with the following characteristics:

Where the attrs column is JSON text that looks like:

Notice that there are nested arrays of data, e.g. “items>>moresubnesting”, and that there is also some sensitive data in the structures, e.g. “SSN”.

What we want the Satori system to do is manage the discovery of any sensitive PII for us, and, we also want to be able to make alterations to the data inventory on a manual basis. Satori lets us do both of these tasks.

1. Automatic Discovery

Satori will automatically find a few of the attributes, specifically “gender” and “SSN”. “SSN” appears twice in the above data and Satori will find it twice:

  1. In the UI under Data Inventory, drill into the table with JSON data, then click {}attrs.
  2. Click ADD to start manually tagging your JSON data.
  3. Notice that some data classification already occurred – for these there is no pencil icon to edit the JSON locations. You can still remove the tag entirely if you want by clicking the X in the tag itself.
  4. In the above screenshot, we clicked “ADD” and then told Satori about our new JSON classification: newfield2, and tagged it as type Address.
  5. Notice in the above screenshot that these manually added classifications receive a different tag icon (a person icon) to differentiate from the automatic data inventory classification process. See the next section for more info on manually classifying your JSON data.

Now, when you query this data, redaction and masking occur as specified by your configuration, and audit entries include support for all of the JSON attributes you have defined:

2. Manual Data Classification

You may have noticed that one of the JSON attributes, newfield2, was manually added by the operator. Here is what the Satori Tag data entry looks like for our new field newfield2:

While Satori will likely discover all types and manners of sensitive data in your JSON, there will be times when the data stewards of the system need to add additional classifiers against your data.

Other Platforms

Everything discussed above also applies to JSON fields in Snowflake, MySQL tables which contain columns of type JSON, MongoDB, and other platforms.

Let’s look at some examples from these platforms as well:

In the above screenshot, address and phone were partially masked (as per the policy defined in Satori), and price was redacted entirely. Price is also a nested attribute. Here is the data inventory from Satori’s perspective:

Conclusion

Using Satori with JSON semi-structured data enables you to remain in control of your sensitive data. You can continuously and manually discover sensitive data and apply dynamic masking to this data to ensure your data is secure. 

To learn more about Satori:

Learn More About Satori
in a Live Demo
Book A Demo
About the author
|Solution Architect

Ty Alevizos is a Principal Solution Architect at Satori Inc. He has 3 decades of experience in data-related fields, including database management, BI and visual analysis, data science principles, and organizational best practices around data topologies and data security. He graduated from U.C. Berkeley with a degree in music composition, and in his spare time plays jazz bass in Seattle and the Pacific Northwest region.

Back to Blog