Democratize Data in AWS Redshift With Self-Service Data Access Workflows

Amazon Redshift allows companies to analyze large amounts of data, whether the data is stored in a data warehouse or in a data lake (by using Amazon Redshift Spectrum). In many cases, the data stored in Amazon Redshift (or accessed by using it) contains sensitive information, so companies need to limit access to this data.

 

Data engineering teams, which are charged with allowing users and groups specific permissions to data, usually grant access to Amazon Redshift data. Simultaneously limiting access to this data and allowing data democratization processes, which involve many people seeking data access, is a challenging task. And this challenge often results in either a limitation of data access that is too strict, which makes harvesting value from data more time-consuming or a limitation of data access that is too loose, resulting in security risks and compliance issues.

 

Since our vision at Satori is to streamline DataSecOps for data-driven companies, we are enabling companies that use Amazon Redshift to reduce the overhead that data access management takes from data engineering teams and allow innovative data access workflows including self-service and approval-based data access.

 

Let’s explore how you can simplify data access in Redshift by using Satori:

Before you continue, here's a quick video of how to apply self-service in Satori:

 

Redshift self-service data access using Satori

 

Letting Data Owners Manage Data Access

Data engineers can get overloaded with data access requests, translating into operational problems (as they get less data engineering accomplished), as well as leading to security problems (by allowing too much access and failing to follow up on data access and revoke it when it is no longer needed).

 

On the other hand, it can be difficult to let data owners or stewards manage access to their own data from within Amazon Redshift. This process requires a skill set that business owners of the data usually lack. Furthermore, in many cases, it is either challenging or impossible to allow data owners to manage data access without granting them access over the cluster that is too broad.

 

With Satori, you can simplify this process by defining datasets and then letting data owners define data access workflows without needing data engineers. Datasets in Satori can include one or more tables, schemas, or databases, or they can be defined across different clusters, even for various data technologies. For example, a dataset can incorporate all data belonging to a certain team in Redshift as well as in Snowflake or RDS.

 

In the illustration below, you can see how Satori datasets can be very flexible, allowing for a multitude of use-cases. All of the areas within the dashed lines are valid Satori datasets:

In the screenshot below, you can see what a simple dataset configuration looks like in Satori, where the dataset for a project is comprised of tables from both the EU and the US Redshift clusters:

Once a dataset is defined, its owners can define how it can be accessed from Satori, without needing to modify anything in the underlying data infrastructure. Owners can define which users can obtain access to the dataset, where the users can be Redshift DB users, where they can be IdP users, or even create custom directory groups of users defined in Satori.

 

Not only is it possible to direct access in a simple manner, but the data owners can also define an inactivity period after which data access will be revoked. This way, users in the organization do not accumulate unnecessary access to data they do not use. This setting reduces the risk involved with granting over-access to users.

 

In the screenshot below, you can see how owners can create data access through Satori through a very simple process, without involving data engineers:

Your Own Instant Redshift Self-Service Data Access Portal

A self-service data access workflow enables defining users or groups of users who do not have access to data but can request and enable access on their own, after providing a business justification. In most cases, this portal replaces situations where these users would simply have access to the data without any access granting process.

 

This is how the process is done:

  1. The consumer attempts to access data using any client connection to Amazon Redshift and is given a link to the data access portal within the authorization error.
  2. After navigating to the data portal, the consumer indicates their business justification for accessing the data.
  3. Once they submit the request in the portal, the consumer automatically gets access to the data for a limited, specified period of time, and the business justification is recorded.

 

This capability can be useful when you have no problem with allowing certain groups access to specific data but would still like to maintain control and records for compliance reasons. This process also ensures that, if data is being accessed, it is being done for a reason.

 

For example, the following configuration allows users from the data scientists group to gain self-service access to a dataset, as long as they are still using it. If they experience an idle period of over a week, the system will require scientists to request self-service access again.

Your Own Approval Workflow for Data Access

Defining an approval workflow for data access works in much the same way as self-service data access, with an added step: approval by the data owner configured in Satori. In this process, the data owner defines groups of users who can request data access receives notifications when users request access, and approves or rejects the request.

 

In the screenshot below, you can see such an approval request, automatically generated by Satori, and e-mailed to the data owner:

 

Reducing Amazon Redshift Data Access Risks

An important part of DataSecOps is not compromising on security while allowing for data democratization. In this case, in addition to achieving the primary goals of streamlining data access and reducing your time-to-value from the data in your Amazon Redshift ecosystem, you are also obtaining better data access security.

 

Here are the main ways in which using Satori’s data access improves your data access security:

  • Having data engineers constantly configure data access granting and revocation often creates security gaps. This occurs either because access becomes too lenient due to frequent data access requests or because of human errors. Allowing the data owners to manage access to their data in a clear way helps reduce the likelihood of these security gaps.
  • Satori’s system allows owners to, in effect, provide data to fewer groups of people without additional hassle, thereby reducing data exposure risks.
  • Data in Satori comes with a TTL (Time To Live). In other words, when you are granting access, it can be temporary in time or temporary when unused. This capability significantly reduces over-permissions and the security risks which come with them.
  • In Satori, all data access requests are recorded in a log with their reasoning, so you can easily keep track of access over time.

Conclusion

The best way to learn about self-service is to schedule a demo to discuss how self-service Amazon Redshift data access can work for your company. You can also read more about our self-service data access here or read the product documentation here.