- What Is Tokenization & De-tokenization?
- Tokenization vs. Encryption
- Snowflake Data Encryption
- Use-Cases for Encryption & Tokenization of Data within Snowflake
- 5 Ways to Keep Data Encrypted or Tokenized in Snowflake
What Is Tokenization & De-tokenization?Tokenization is the process of converting sensitive data into non-sensitive "tokens". These tokens can be utilized in a database or internal system without being brought into scope. Although the tokens hold unrelated values, they may maintain certain aspects of the original data, such as its length or format, allowing you to use them in business operations without interruption. The original sensitive information is then securely stored outside of the company's systems or in a specialized vault service. Tokenized data, unlike encrypted data, is unbreakable and irrevocable. This distinction is critical: tokens cannot be returned to their original form without the existence of the additional, independently stored data since there is no mathematical relationship between the token and its original number. As a result, if a tokenized environment gets breached, the original sensitive data will not be compromised. In this context, a token is a bit of data that serves as a stand-in for a more valuable piece of data. Tokens have very little intrinsic value: they are only useful because they symbolize something more significant. Tokens can be single-use (low-value), such as those used for one-time debit card transactions that do not need to be stored. Alternatively, tokens can be used for persistent (high-value) items like a repeat customer's credit card number that needs to be saved in a database for repeating transactions. On the other side of this process, de-tokenization involves exchanging the token for the original data. Only the original tokenization system can de-tokenize the data: there is no other method to get the original information just by looking at the token.
Tokenization vs. EncryptionAs previously mentioned, tokenization and encryption are not the same things and should not be used interchangeably. Each technology has its own set of strengths and drawbacks, and, depending on the specific situation, you should choose one or the other as the data security method. However, both encryption and tokenization are used to safeguard the end-to-end process in some circumstances such as for electronic payment data. In a nutshell, here are the most significant differences that set tokenization and encryption apart:
|Using an encryption method and a key converts plain text to ciphertext mathematically.||Creates a token value for plain text randomly and saves the mapping in a database.|
|Used for both organized and unstructured data, such as entire files.||Used mostly for fields with structured data, such as credit card or Social Security numbers.|
|Format-preserving encryption algorithms have a lower strength as a compromise.||You can preserve the format without compromising security.|
|Can be easily scaled to large data volumes by decrypting data with only a (relatively) small encryption key.||As the database grows, it becomes more difficult to scale tokenization safely and maintain performance.|
|The original data gets removed from the organization, but it is encrypted.||The original data never leaves the company, meeting certain regulatory needs.|
|An ideal way to share sensitive information with others who have access to the encryption key.||Data interchange is difficult because it necessitates direct access to a token vault that maps token values.|
Snowflake Data EncryptionEncryption converts data into a code that is only readable with a secret key, a decryption key, or a password. In the context of a data cloud, encryption can refer to two things: securing data sent to and from the data cloud, also known as data in transit, and securing data kept in tables, also known as data at rest. Both processes have become common in recent years to protect your data from serious threats. Another option for gaining even more control over how your data is protected is never disclosing your keys with Snowflake. This means applying field-level encryption on specific fields, and decrypting the content on the client-side. This way, the decryption keys are not exchanged with the server in advance, if at all. In client-side encryption, the user encrypts data before loading it into Snowflake, and the data is saved in encrypted form in Snowflake tables. The client must encrypt their data at the field level rather than at the table, schema or account level to execute this process successfully. As one might have guessed, the main advantage is that even users who have access (or gain access) to the Snowflake account, will not be able to get its clear-text sensitive content. In the same way, data can be tokenized as part of its ingestion process, and the tokens can then be stored in Snowflake (while the sensitive data is stored in a vault). For more information specifically about encryption in Snowflake, read our Snowflake Encryption Guide.
Use-Cases for Encryption & Tokenization of Data within SnowflakeEncryption and tokenization can be used in tandem to safeguard data while being transferred over the internet or stored at rest. Accordingly, here are more Snowflake use cases for data encryption and tokenization:
- According to risk assessments, a certain type of data is extremely sensitive. Hence, the organization wants it to be encrypted or tokenized even if employees or others have access to it within Snowflake data tables.
- The organization may have data that must be encrypted or tokenized to avoid being stored in plaintext due to regulatory compliance or privacy requirements.
- How Satori helps secure data access for Satori at scale
- How Satori applies anonymization for Snowflake and other data platforms
5 Ways to Access Data Encrypted or Tokenized in Snowflake
Keeping Data in Clear-Text within Snowflake Tables
2. Keeping Encrypted Data: Decrypting with the Decrypt FunctionAssume you have used Snowflake's built-in ENCRYPT function, or an equivalent implementation, to encrypt the comment text in the review column with a passphrase. You can now rewrite the same query to tell Snowflake to only decrypt the review column when it gets queried:
|SELECT id, first_name, last_name FROM employees WHERE DECRYPT(review, '<PASSPHRASE>') ILIKE '%incredible%'|
3. Keeping Encrypted Data: Decrypting with an External FunctionA popular approach is to employ an external User-Defined Function (UDF) to decrypt data using a third-party service that consumers control. For example, you can define an external UDF that calls your service (a cloud API which runs your code) to decrypt the data:
|CREATE OR REPLACE EXTERNAL FUNCTION decrypt_varchar_ext(v varchar) varchar api_integration = acme_api1 AS '<AWS API GW URL>';|
|SELECT id, first_name, last_name FROM employees WHERE decrypt_varchar_ext(review) ILIKE '%incredible%'|
4. Client-side Offline Decryption or De-TokenizationThe next option is to store and access the sensitive data in Snowflake while it remains encrypted or tokenized. Then, only once the data is pulled and stored in another system, decrypt or de-tokenize it. This may be performed as part of a data pipeline, such as for decrypting employee sensitive data only as an ETL when moving to payments. This method will obviously have overheads to build the functionality and will also delay data access, as the data needs to be processed after the actual data access. However, if the requirements are that the data is not decrypted as part of Snowflake’s data processing, it will satisfy this requirement.
5. Just-in-Time Proxy Decryption and De-TokenizationIf the requirements are that parts of the data are never processed in clear-text by Snowflake, there is another option to fulfill it: data can be decrypted while in transit from Snowflake to the user. This means that, if a user meets certain requirements (e.g. the user is a member of the active directory group “accounting”), certain types of data will be decrypted by a proxy that sits between the data store (in this case Snowflake) and the user. The advantages of this option are that, although the data is not at any time in clear-text within Snowflake, the process of transforming the data to be human-readable is transparent and does not slow down the work that needs to be done with the data.
SummaryBelow is a table summarizing the different options of field-level encryption, with the main differences:
No Field-Level Encryption
Using External Functions
Client-Side - Offline or Pipelined
Client-Side - Just-in-Time Proxy
|Is data in the table stored without sensitive parts?||No||Yes||Yes||Yes||Yes|
|Is the data always encrypted or tokenized within Snowflake?||No||No||No||Yes||Yes|
|Are results given dynamically in the query itself?||Yes||Yes||Yes||No||Yes|
|Does it support custom decryption or de-tokenization services?||No||No||Yes||Yes||Yes|
|Can the decrypted values be used as part of a query?||Yes||Yes||Yes, but it may have a significant performance impact.||No||No|