Amazon Athena is a leading cloud query service, utilized to query data stored in S3 buckets. It is thus a quick data lake enabler, as you can query a variety of data formats instantaneously and empower your data consumers by providing an easy way to turn data into value.
Today, I will be discussing the benefits of using Athena in conjunction with Satori in order to enhance your data lake security.
By combining the capabilities of Amazon Athena with Satori’s universal data access control service, you can receive the following benefits:
Granular Access Control for Athena
As someone who owned an S3 + Athena data lake, I can attest to the fact that managing data accessibility within Amazon Athena is not a simple task, and in some cases must be done at the application level. When managing access controls in Satori, decoupled from Athena’s infrastructure, you can easily set policies that reduce security risks in a timely manner. As an example, let’s go through the process of preventing sneaky marketers from accessing payment card information which may have found its way into the data lake.
Preventing Marketing Access to PCI
With Satori, you only need to define the YAML code below in order to prevent anyone using the Marketing_API user to access any PCI data instances across all Athena* tables.
Preventing marketing team from accessing PCI data- name: Prevent marketing access to PCI
* the same policy can be applied to other data stores as well
Automated Data Classification
All data that is queried is automatically classified by Satori’s engine, so you can immediately know the location of your PII and other sensitive data and act accordingly to reduce risk (by implementing better access control or applying universal masking), achieve faster compliance with regulation requirements, and fulfill privacy goals (by knowing the location of private consumer information).
As shown below, you can get the classification results in the Universal Audit:
Managing Data Access Across All Data Stores
While managing data access through the Athena queries on your data store is important on its own, it is only one piece of the puzzle. With Satori, you can manage and control data access at scale, and across different data stores, eliminating the need to delve into each technology and set tech-specific access controls on other data stores.
Below, we can see user data access in the same place using Satori, both when accessing the Snowflake data warehouse (in the first row) and when querying data lake information through Amazon Athena (in the second row).
Universal Data Masking on Amazon Athena
Most of the time, data analysts and scientists normally do not access sensitive data due to malicious intentions. However, when they do access the data, they introduce risks to the organization. With Satori, you can quickly set and customize masking profiles, enabling different roles, users, or locations to get dynamically masked data based on your needs.
Below is an example of retrieving data dynamically masked by Athena, with a quick set-up across all PII queries from data lake:
Dynamically Masked Data
Here, the same data is shown with no masking defined
Built-in Amazon Athena Data Inventory
When using Satori to manage your Amazon Athena data access, you can obtain visibility into an inventory of the data lake information that is connected through your Amazon Athena. This is useful in using the metadata provided to enable better access controls, generate reports, or tag your own data types to enable masking on them using universal masking.
This system saves time and simplifies the integration process in order to create the data inventory, which is commonly done in an ad-hoc manner. The next step of actually applying the data inventory to limit access control is yet another complication which can also be simplified immensely using Satori.
Getting Data Access Analytics for Athena Usage
As with other data stores, Satori provides data access analytics, which may help both security and operational teams such as DataOps teams. This benefit is especially useful since Athena is a pay-per-query engine, and you can obtain insights into data access which may be optimized (for example by moving to a Redshift cluster).
Data access analytics for Amazon Athena:
In addition, because you receive the analytics separately from Athena’s system, you can combine the analytics reports from several different data stores. For example, you can view both your Amazon Athena and Amazon Redshift data access analytics simultaneously on a single window.
Getting Started with Satori Over Amazon Athena
If you would like to learn more, contact us, and we will be happy to provide you with a trial account and show you how it works.
If you are an existing Satori customer, you can find more technical information about setting up your Athena datastore in our Athena Datastore documentation.
Ben is an experienced tech leader and book author with a background in endpoint security, analytics, and application & data security. Ben filled roles such as the CTO of Cynet, and Director of Threat Research at Imperva. Ben is the Chief Scientist for Satori, the DataSecOps platform.