Satori selected as a representative vendor in the Gartner Market Guide for Data Security Platforms 🥳

Snowflake

5 Ways to Apply Decryption & De-Tokenization in Snowflake

|Chief Scientist
This article will discuss the distinction between tokenization and encryption and their inverses: decryption and de-tokenization.   When it comes to securing data while it is being transferred over the internet or stored at rest, tokenization and encryption are frequently referenced simultaneously. While tokenization and encryption are both powerful data security technologies, they are distinct and not interchangeable, which is why we are going to discuss them now.   In this article we will discuss:

Read More:

What Is Tokenization & De-tokenization?

  Tokenization is the process of converting sensitive data into non-sensitive "tokens". These tokens can be utilized in a database or internal system without being brought into scope.   Although the tokens hold unrelated values, they may maintain certain aspects of the original data, such as its length or format, allowing you to use them in business operations without interruption.   The original sensitive information is then securely stored outside of the company's systems or in a specialized vault service. Tokenized data, unlike encrypted data, is unbreakable and irrevocable. This distinction is critical: tokens cannot be returned to their original form without the existence of the additional, independently stored data since there is no mathematical relationship between the token and its original number. As a result, if a tokenized environment gets breached, the original sensitive data will not be compromised.   In this context, a token is a bit of data that serves as a stand-in for a more valuable piece of data. Tokens have very little intrinsic value: they are only useful because they symbolize something more significant.   Tokens can be single-use (low-value), such as those used for one-time debit card transactions that do not need to be stored. Alternatively, tokens can be used for persistent (high-value) items like a repeat customer's credit card number that needs to be saved in a database for repeating transactions.   On the other side of this process, de-tokenization involves exchanging the token for the original data. Only the original tokenization system can de-tokenize the data: there is no other method to get the original information just by looking at the token.    

Tokenization vs. Encryption

    As previously mentioned, tokenization and encryption are not the same things and should not be used interchangeably. Each technology has its own set of strengths and drawbacks, and, depending on the specific situation, you should choose one or the other as the data security method. However, both encryption and tokenization are used to safeguard the end-to-end process in some circumstances such as for electronic payment data.   In a nutshell, here are the most significant differences that set tokenization and encryption apart:  

ENCRYPTION

TOKENIZATION

Using an encryption method and a key converts plain text to ciphertext mathematically. Creates a token value for plain text randomly and saves the mapping in a database.
Used for both organized and unstructured data, such as entire files. Used mostly for fields with structured data, such as credit card or Social Security numbers.
Format-preserving encryption algorithms have a lower strength as a compromise. You can preserve the format without compromising security.
Can be easily scaled to large data volumes by decrypting data with only a (relatively) small encryption key. As the database grows, it becomes more difficult to scale tokenization safely and maintain performance.
The original data gets removed from the organization, but it is encrypted. The original data never leaves the company, meeting certain regulatory needs.
An ideal way to share sensitive information with others who have access to the encryption key. Data interchange is difficult because it necessitates direct access to a token vault that maps token values.

Snowflake Data Encryption

Encryption converts data into a code that is only readable with a secret key, a decryption key, or a password.   In the context of a data cloud, encryption can refer to two things: securing data sent to and from the data cloud, also known as data in transit, and securing data kept in tables, also known as data at rest. Both processes have become common in recent years to protect your data from serious threats.   Another option for gaining even more control over how your data is protected is never disclosing your keys with Snowflake. This means applying field-level encryption on specific fields, and decrypting the content on the client-side. This way, the decryption keys are not exchanged with the server in advance, if at all.   In client-side encryption, the user encrypts data before loading it into Snowflake, and the data is saved in encrypted form in Snowflake tables. The client must encrypt their data at the field level rather than at the table, schema or account level to execute this process successfully. As one might have guessed, the main advantage is that even users who have access (or gain access) to the Snowflake account, will not be able to get its clear-text sensitive content.   In the same way, data can be tokenized as part of its ingestion process, and the tokens can then be stored in Snowflake (while the sensitive data is stored in a vault).   For more information specifically about encryption in Snowflake, read our Snowflake Encryption Guide.    

Use-Cases for Encryption & Tokenization of Data within Snowflake

    Encryption and tokenization can be used in tandem to safeguard data while being transferred over the internet or stored at rest. Accordingly, here are more Snowflake use cases for data encryption and tokenization:  
  • According to risk assessments, a certain type of data is extremely sensitive. Hence, the organization wants it to be encrypted or tokenized even if employees or others have access to it within Snowflake data tables.
  • The organization may have data that must be encrypted or tokenized to avoid being stored in plaintext due to regulatory compliance or privacy requirements.

Learn More:

5 Ways to Access Data Encrypted or Tokenized in Snowflake

 
  1. Keeping Data in Clear-Text within Snowflake Tables

  This option relies on Snowflake’s encryption at rest and encryption in transit. The main advantage of this method is that you can process it as is, without any limitations or additional architecture. The downside to this option, though, is that users with access to the Snowflake account may have access to the data in clear-text. This may be a risk that needs to be mitigated as a security or compliance requirement, thus making this option impossible.    

2. Keeping Encrypted Data: Decrypting with the Decrypt Function

    Assume you have used Snowflake's built-in ENCRYPT function, or an equivalent implementation, to encrypt the comment text in the review column with a passphrase. You can now rewrite the same query to tell Snowflake to only decrypt the review column when it gets queried:  
SELECT id, first_name, last_name FROM employees WHERE DECRYPT(review, '<PASSPHRASE>') ILIKE '%incredible%'
  Snowflake would use the decrypt function on the comment text stored in the review column as part of processing this query and only return rows with the word incredible in the comment text. The decryption logic is then hard-coded within a secure view so that the decryption passphrase is not exposed to the end-users, and it is likely to be based on the users’ roles. Since this is done as part of the computation operation, the data will still be decrypted within the Snowflake compute resource, as well as in the temporary Snowflake cache that is stored. This process may be considered a security or compliance risk, or it may be compensated by other factors.    

3. Keeping Encrypted Data: Decrypting with an External Function

    A popular approach is to employ an external User-Defined Function (UDF) to decrypt data using a third-party service that consumers control. For example, you can define an external UDF that calls your service (a cloud API which runs your code) to decrypt the data:  
CREATE OR REPLACE EXTERNAL FUNCTION decrypt_varchar_ext(v varchar) varchar api_integration = acme_api1 AS '<AWS API GW URL>';
  Then, in the same way as Snowflake's built-in DECRYPT function, use the UDF:
SELECT id, first_name, last_name FROM employees WHERE decrypt_varchar_ext(review) ILIKE '%incredible%'
This method enables you to place the decryption logic outside of Snowflake, which may  be considered less risky but may also have a performance impact, especially in large volumes. This process also allows you to use a de-tokenization integration. As with the previous option, since this encryption is done as part of the computation operation, the data will still be decrypted within the Snowflake compute resource, as well as in the temporary Snowflake cache that is stored. Therefore, it may be considered a security or compliance risk or may be compensated by other factors.    

4. Client-side Offline Decryption or De-Tokenization

    The next option is to store and access the sensitive data in Snowflake while it remains encrypted or tokenized. Then, only once the data is pulled and stored in another system, decrypt or de-tokenize it. This may be performed as part of a data pipeline, such as for decrypting employee sensitive data only as an ETL when moving to payments. This method will obviously have overheads to build the functionality and will also delay data access, as the data needs to be processed after the actual data access. However, if the requirements are that the data is not decrypted as part of Snowflake’s data processing, it will satisfy this requirement.  

5. Just-in-Time Proxy Decryption and De-Tokenization

  If the requirements are that parts of the data are never processed in clear-text by Snowflake, there is another option to fulfill it: data can be decrypted while in transit from Snowflake to the user. This means that, if a user meets certain requirements (e.g. the user is a member of the active directory group “accounting”), certain types of data will be decrypted by a proxy that sits between the data store (in this case Snowflake) and the user. The advantages of this option are that, although the data is not at any time in clear-text within Snowflake, the process of transforming the data to be human-readable is transparent and does not slow down the work that needs to be done with the data.    

Summary

  Below is a table summarizing the different options of field-level encryption, with the main differences:    

No Field-Level Encryption

Using Decrypt

Using External Functions

Client-Side - Offline or Pipelined

Client-Side - Just-in-Time Proxy

Is data in the table stored without sensitive parts? No Yes Yes Yes Yes
Is the data always encrypted or tokenized within Snowflake? No No No Yes Yes
Are results given dynamically in the query itself? Yes Yes Yes No Yes
Does it support custom decryption or de-tokenization services? No No Yes Yes Yes
Can the decrypted values be used as part of a query? Yes Yes Yes, but it may have a significant performance impact. No No

Learn More:

Learn More About Satori
in a Live Demo
Book A Demo
About the author
|Chief Scientist

Ben is an experienced tech leader and book author with a background in endpoint security, analytics, and application & data security. Ben filled roles such as the CTO of Cynet, and Director of Threat Research at Imperva. Ben is the Chief Scientist for Satori, the DataSecOps platform.

Back to Blog