Implementing Row-Level Security with Satori

There are moments when architecting data where you need to make decisions that will have broad implications in the future. In many cases, such decisions lead to the requirement for specific access control mechanisms, which often becomes the catalyst for the need of Row-Level Security - a choice for certain data architectures. In this blog post, I will share some use-cases that drive the necessity  for row-level security, as well as demonstrating how Satori helps solve the problem in a revolutionary way. In the past, I also gave examples of setting up row-level security in platforms like Snowflake and Redshift.

 

So… what IS row-level security?

Row-level security has several different names, such as Row-Level Access Control, or Row Based Security. In general, it describes a situation where there is tabular data, and certain users or groups of users should have limited access to that said data. The most simple diagram is this, where each user can access a certain portion of the rows in the table:

User Access Diagram

Access to the data is restricted according to a Row-Level Identifier, which is a value or set of values that is typically an identifier in one of the columns. As we will see in the examples below, this will depend on the exact situation. Based on the circumstance,  you may or may not require an additional resource that will link the value in the row level identifier to the access restrictions per user or group of users.

Satori and Row-Level Security

Satori helps solve Row-Level Security issues for existing and new data architectures. Using Satori requires no installation or changes to the data store itself, and it is implemented when retrieving the data. This allows adding Row-Level Security in a very convenient and flexible way, even for relatively complicated scenarios, such as the ones listed below. If you are new to Satori, you can learn more about what we do here.


When setting up row-level policies in Satori, you can tag data to add context to the Satori Data Controllers when that data is being accessed. You can then set policies to specify specific actions to be taken in certain scenarios. For example, in the tags, we can define the situation (i.e., filtering over a specific column), and determine in the policy what the action will be (i.e., block access if certain filtering is non-existent).

 

Row-Level Security use-cases

Not all Row-Level Security architectures are made the same, and not all of them are a simple column specification that determines which user ID is able to access which row. Here are some examples to take into consideration:

 

Basic Row-Level Security

At the basic level, there is a table or set of tables, where a certain column acts as the row identifier, to make decisions about allowing or denying user access to the data in that specific row. This can be an ID number, which identifies the group of users who should be allowed to access the data. A common use-case is in investment companies, where certain analyst teams should only have access to the data of the portfolios they’re managing, not other teams’ data.

 

Let’s walk this through this investment firm example,where each transaction has an analyst_bucket column, specifying the analyst team who owns the record. As mentioned above, when setting up row-level security in Satori, we need to define tags to give context to the data access.

 

1. Setting up the mapping tags in Satori

In this case, we can set up a tag that will  tag the data access when it is filtered by a certain field. From there, we want it to create a mapping of the column value, and indicate how it’s represented in a tag. An example would be:

- display_name: "Filtered by analyst bucket"
    tag: "filtered_by_"
    matcher_id: "columns_exact_filter"
    matcher_data:
      match:
        - column: analyst_bucket
      tagger_alias:
        a: 1
        b: 2
        c: 3

In addition, we will create another tag configuration to tag all the tables in which we want to enforce row-level security:

- display_name: "Tables with customer data"
    tag: "customer_data"
    matcher_id: "table_name"
    matcher_data:
      match: 
        - contacts
        - portfolios
        - transactions

In the below example, we can see that all data access that will filter by the column analyst_bucket will be tagged accordingly. This is detached from the actual data storage technology and will work whether it’s on Snowflake, Redshift, or other data stores. So, if we query the database with the following query...

SELECT * FROM transactions
WHERE analyst_bucket=3;

 

...this will be mapped, and tagged with the tag filtered_by_c. It should also be tagged by the Customers Data tag, as the data was in one of the tables configured to that tag. If we look in the Satori management console under Audit and expand the tags section, we see that the data access indeed was tagged with the appropriate tags:

Tags

If you want to drill-down into this example, you can request a free demo of Satori, or go to our documentation for more information about tags.

 

2. Setting up the mapping tags in Satori

Using the same example, let’s now set a policy so that we can enforce this row-level security. This policy will be enforced regardless of which datastore is used by your organization (Snowflake, Redshift, Postgres, BigQuery, etc). The policy which we will use is the following: stating that users must filter their queries by the analyst_bucket column (explicit row-level security). In other words, if you try to access a table tagged with the customer_data tag, and you’re using the Snowflake role of team C, you must include a filter on the analyst bucket so that the filtered_by_c tag is applied:

- name: Row level security for Analyst Team C
    action: block
    identity_tags:
      - identity.datastore.role::analyst_team_c
    data_tags:
      - not filtered_by_c
      - customer_data
    priority: 3

This particular example is quite simple to apply and is easily automated by using our API to generate the required tag maps and policies. This example is also applied based on the Snowflake database role used, but it can also be applied by other identity tags, such as IdP groups.

Now let’s discuss some more adventurous use-cases of row-level security…

 

Row-level security for semi-structured data

This use-case is when you have tables with some of the columns containing semi-structured data within them, and the filtering criteria for the access control are within the columns in json format. An example would be when a table contains events from different systems, and the users should only be allowed access to the data that “belongs” to their team within the semi-structured json data.

 

The layout here would be slightly different, but it would still be possible and easy with Satori. Needless to say, we would love to assist you with more exotic tagging like this one:

- display_name: "Filtered by analyst bucket in semi-structured"
    tag: filtered_by_{{tagger_alias}}
    matcher_id: columns_exact_filter
    matcher_data:
      match:
      - table: acme_invetments.arena.feed_data
        - column: message
          json_path: "*['analyst_bucket']"
      tagger_alias:      
        a: 1
        b: 2
        c: 3

 

In this case, the policy would remain the same, as the only thing that changes is the way you describe the data. However, you’re still doing the same thing once you tag it correctly, which is to block illegal access attempts.

 

Hierarchical Row-Level Security

Another cool example of Row-Level Security are scenarios where you need to apply several layers of access level over the data. For instance, say you have different analyst teams, but you also have regional managers who should be able to pull data of the teams that report to them yet not from other regions’ teams.

 

Satori is quite flexible about ways to sort out such issues as that, but in our example, we will simply create an additional tagging to be applied for the regional manager. This means they will only be able to see the data of teams A and B, but not other teams:

- display_name: "Filtered by analyst bucket mgr"
    tag: "mgr_filter_"
    matcher_id: "columns_exact_filter"
    matcher_data:
      match:
        - column: analyst_bucket
      tagger_alias:
        ab: [1,2]

 

In this example, the tag mgr_filter_ab will be applied whenever there is a filtering of data for analyst_bucket 1, analyst_bucket 2, or both. So all the following examples will tag the data with the appropriate tag:

SELECT * FROM transactions WHERE analyst_bucket=1;
SELECT * FROM transactions WHERE analyst_bucket=2;
SELECT * FROM transactions WHERE analyst_bucket=1 OR analyst_bucket=2;
SELECT * FROM transactions WHERE analyst_bucket IN (1,2);

 

Furthermore, you will also see the following tags when going to Satori management console to look for the query in the audit:

Tags

 

Setting up a policy is very similar  as the team row-level security:

- name: Row level security for Analyst Managers AB
    action: block
    identity_tags:
      - identity.datastore.role::analyst_mgr_ab
    data_tags:
      - not mgr_filter_ab
      - customer_data
    priority: 4

 

Other use-cases for Row-Level Security

There are many other use cases for row-level security, such as:

  • sometimes used for multi-tenancy (i.e., storing data of multiple tenants in the same table).

  • used to enforce regional separation (i.e., making certain teams only able to view data specific to their region).

  • giving access to certain logs so only specific users or teams can view certain log entries.

  • applying a “safety net” over a “Right To Be Forgotten” (RTBF) policy, so data about certain data subjects will be inaccessible.

Applying Row-Level Security with Satori

As we’ve seen, applying row-level security with Satori is quite simple once you learn the basic concepts of data tagging and policies (and hey, we’re here to help you do that!). Overall, the advantages over doing the same in the data infrastructure itself are:

  • Policies and tags are decoupled from the datastore and can be applied across several instances easily, even if they’re using completely different technologies.

  • Policies can be added or managed by security teams, data stewards, and data owners, not just the database administrators.

  • You can roll-out the policies in a more granular/soft way. For example, you can set them to work in alert mode only, which is impossible in most databases.

  • As you need to enhance your access control, you don’t need to edit data abstraction layers, which can be hard and risky.

You can also configure row-level security in some BI tools, but that will be attached to the specific BI tool that you’re using. That means that a simple lacking feature that causes analysts to use other BI tools or a Python script will not go through the row-based security policies.

 

If there is a use-case for row-level security which you’d like to discuss, whether it’s covered in this post or not, we invite you to schedule a demo