Meet the Satori team at AWS Summit NYC, July 10th 🗽

Data Management,

DataOps,

Snowflake

Data Inventory Automation with Satori

|Solution Architect

Knowing where your sensitive data is stored at all times is paramount for compliance and security alike. In many cases, this metadata (what sensitive data is stored in each location) is an integral part of a good data inventory.

With Satori, such a data inventory is continuously updated as data is being accessed, but it can also be mapped in an ad-hoc way. An example is when a company wants to run a scan over their entire database or data warehouse as part of onboarding.

In this article we will show you a couple of ways Satori performs a data inventory automation in Snowflake, but let’s start with some basic definitions.

What is Data Inventory

A well-organized, up-to-date, and comprehensive data inventory is a crucial component of your organization’s data management plan. Data inventories make your data more accessible and help you maintain governance, meet compliance requirements, and reduce security risks.

The primary challenge in creating a data inventory is keeping it current. Therefore, you need an agile system to create and maintain your data inventory. An efficient way to achieve this is using Satori’s automated product. 

In this post, we will explore how you can do this with Satori in Snowflake.

Get the latest from Satori

Why You Need Data Inventory

Data-driven organizations store a large quantity of data across various databases, warehouses, and lakes. The primary concern is ensuring your data, particularly your sensitive data, is secure. 

Since data is constantly changing and updating, if your data inventory is not up to date you won’t know what sensitive data you have and where it is located. The resulting danger is that you can’t secure this sensitive data. 

Ensuring that data is secure is critical to your organization meeting the necessary regulatory compliance requirements such as GDPR (General Data Protection Regulation) or CCPA (California Consumer Privacy Act). These regulations require that organizations know the location of all sensitive data they collect and store, thus the necessity for a data inventory, and a continuously updated data inventory. 

Why Automate this Process?

It is possible to maintain the data inventory manually. However, this is time-consuming and risky. It requires a significant amount of data teams’ time and resources, and it is difficult to maintain its currency.   

An alternative is to use Satori to automate this process so that the data inventory is continuously updated and does not require any configuration or integration time. Satori’s solution enables organizations to maintain comprehensive data security that meets and exceeds compliance requirements.

Populating a Snowflake Data Inventory

Automating your data inventory with Satori will locate the sensitive columns and mark them with the appropriate data inventory tags. The effect of this is that you can apply your masking profiles and security policies before your customers or team members query these locations.

We provide two examples in Snowflake, one as a DBA/SQL and the second in Python.

Snowflake via DBA/SQL

In this first example, we create a stored procedure using the standard Snowflake console and then run that procedure against the Satori endpoint. This requires DBA or account admin permissions in Snowflake.

Create a procedure that looks like this:

Highlighted in bold above are the expected changes:

  1. YOUR_DATABASE is the name of the Snowflake database you would like to work against.
  2. The where clause can be changed or removed depending on which schemas in this database you want to work against.

If you then call this procedure:

call satori_table_scan();

It will create a multi-line SQL statement for you to use. If you double-click on the result, you can copy the text:

Log into the Satori endpoint for your Snowflake database console, paste the above text, and run the multiple statements. In Satori’s audit log, you will see corresponding entries for each select statement:

Snowflake via Python

We created and used a stored python gist file in this second example. Here the Snowflake user and password for connections must have the ACCOUNTADMIN role in Snowflake, or, the ability to query all tables and schemas.

In this example, it can be run as-is at the command line. The options are: 

  • dry – this will create SQL statements but not actually execute them
  • wet – this will create and execute SQL statements
  • partial – this will create SQL statements at the table level using: select * from table
  • full – this will create SQL statements at the column level using: select [colname] from table

You will need to edit the top portion of this script and change the following values in bold as needed:

Conclusion

A data inventory is a necessary and important data security component to ensure that all sensitive data is updated and your organization is in compliance. Satori automates this process so that you don’t need to worry about using manual processes and risking sensitive data. 

Satori’s solution is implemented easily across multiple data platforms, warehouses, and lakes. In this post we showed how to implement Satori if you are using Snowflake either as a DBA/SQL or with Python. 

To learn more about how Satori can help you automate your data inventory book a demo with one of our experts. 

Learn More About Satori
in a Live Demo
Book A Demo
About the author
|Solution Architect

Ty Alevizos is a Principal Solution Architect at Satori Inc. He has 3 decades of experience in data-related fields, including database management, BI and visual analysis, data science principles, and organizational best practices around data topologies and data security. He graduated from U.C. Berkeley with a degree in music composition, and in his spare time plays jazz bass in Seattle and the Pacific Northwest region.

Back to Blog