How to Scale Snowflake Roles Management

RBAC, or Role-Based Access Controls, are foundational tenets of good data governance and key to keeping sensitive and proprietary information safe in today’s enterprise environments. As regulatory and security pressure surges, the demand for accessible data access controls is reaching critical mass, leading data providers  to roll out constant streams of new and improved capabilities that can support growing data governance programs. Snowflake, a world-leading data provider, has led this new wave with a wide variety of native tools and services. Let’s investigate what they have to offer, and explore the pros and cons of Snowflake’s RBAC for enterprise workflows and needs.

Read our post about how Satori helps simplify role management for Snowflake and other data stores.

 

Here’s what we’ll cover:

Using IAM Groups

Snowflake administrators can assign data ownership to IAM or IDP groups (for example, in the organization’s Okta), which are converted to Snowflake roles. For example, marketing, or at least a subset marketing group, usually has ownership over data enrichment.

 

At least in theory, this creates an ideal situation where the user’s context is managed by the identity management team (usually in IT), and the access to data is limited according to the groups we already have for the users in our organization.

 

However, this tool is not ideal for scaling. Today’s consumers often require access to data that is owned by their specific IAM groups. Identity management teams can theoretically add consumers to owner groups when necessary, but this can introduce too many complexities, lead to unexpected results (such as being entitled to other resources enabled by being part of this group) and contradict the baseline principles of identity management.

 

Data teams may instead try to generate specific ownership groups for the purposes of data access, but that still requires them to correlate each dataset within Snowflake to an IAM group. The only way to achieve this is by data engineering, identity management and data owners  miraculously agreeing on how to define identity groups and their respective datasets—a nearly impossible feat when you consider that data is a moving target.

 

Using Custom Snowflake Permissions

Snowflake’s Privileges controls offer a more viable route for scaled RBAC. Authorization is only half of the coin, data owners must still grant them access. Data engineers typically grant this access by enabling one of the  consumer's Snowflake roles.

 

This enables setting specific roles per project, or for users of a certain domain, not having to rely and delay access until the identity team sets up a new group, and so is usually faster than setting a new identity group.

 

In other words, this can be as quick as running the following statements by data engineering:

GRANT SELECT ON enrichments.userbase.table1 TO ROLE DATA_SCIENCE;
GRANT SELECT ON enrichments.userbase.table2 TO ROLE DATA_SCIENCE;

It is key to note that since access must be meted out through roles, this approach means that access will be granted to  anyone else grouped under that role, too.

 

Data engineers must also consider some drawbacks. After providing access to more than one person, they must maintain a close watch in order to remove it as soon as it is no longer required. Many organizations do not do this, leading to rampant over-privileged data access across enterprise environments and data stores later on. It is also important to keep track of data changes (remember that data is often a moving target) and may require additional configurations to remain optimized.

 

Building Role Hierarchy

Snowflake’s Role Hierarchy optimizes data authorization by enabling users to create an abstraction layer where roles can gain access to the privileges of other roles. For example:

  • USER_ENRICHMENTS role has access to the user enrichments dataset, as configured by data engineering.

  • MARKETING_RESEARCH, as well as DATA_SCIENCE inherit from this role

Only one command is required at the beginning of a project.

GRANT ROLE USER_ENRICHMENTS TO ROLE DATA_SCIENCE

 

That is true also about the simplicity of revoking the privileges:

REVOKE ROLE USER_ENRICHMENTS FROM ROLE DATA_SCIENCE

The configuration of securable objects (tables, views, functions, etc) can be done for the entire “parent role”—and save considerable precious engineering time and effort in the process.

 

Hierarchy Hell

However, scaling with this tooling will require a careful hand. In many cases, hierarchies can complicate data access flows when introduced without well thought-out planning and strategy. It is fairly common for these flows to collapse under the inevitable chaos and bloat of “unstructured” hierarchies. This is because, over time, layers of roles tend to form, and removing the complexity becomes a very risky task that the team is unwilling to take on. Therefore, this step must be done with careful planning and a great deal of structure.

 

Adding Roles Per User

Another route for data teams is to create individual roles for individual users, meaning that each user receives their own dedicated role. Doing this  allows for more granularity, granting users access to solely what they require and reducing the risk of future overprivileged environments.

 

Unfortunately, such a high degree of granularity comes with significant overhead, and can quickly spiral in both number and complexity as consumer bases grow.

 

Open Access: Eliminating Data Access Restrictions

In this data access method, users are granted broad access upon receiving access to the Snowflake data warehouse. While this certainly reduces the amount of data engineering time spent on granting access over objects to users, of all methods this carries the most risk of over-privileging users or even breaking compliance. If this feels too much like deja-vu, we suggest remembering the manner in which security, compliance and privacy looms ever larger for all of us these days.

 

Building a Self-service Data Portal

Self-service is one of the greatest methods for a scaling Snowflake role management strategy. We even dedicated an entire ebook about self-service data access, a method for enabling (ideally temporary) access to datasets. This can take the form of a well audited workflow, based on a central business process (such as requiring a justification for the data access or a manager or data owner approval).

 

There are several ways to achieve this, each of which require writing an application that will manage Snowflake roles and users by sending GRANT and REVOKE SQL queries to Snowflake, and keep track of the access granted.

 

However, this is another costly process with significant overhead. Carrying out a process like this requires resources to go into building out a bespoke self-service platform for the organization, creating a “data mart” for the organization, or working with data access solutions (like Satori!) to enable that.

 

Summary

Method

Pros

Cons

IAM groups

Minimal Snowflake roles

Dependency on IT.

Requires data access modeling in the IAM.

Permissions

Central management

Data stewards can’t assign permissions.

Data engineering overhead.

Role Hierarchy

No changes to per-role securable object privileges

May lead to “Hierarchy Hell” (complex roles relations)

Role per user

Granular

High Maintenance

Open Access

No interruption to data consumers

Security & compliance risks

Self-service data portal

Lightweight for data consumers and data engineering

Requires homegrown development & application maintenance

In summary, Snowflake provides a wide variety of tools for controlling access to data through its built-in RBAC model. Scaling them presents a number of challenges that require careful strategizing with a keen avoidance of more overhead or additional risks. Understanding the pros and cons of each can help minimize both and help data teams build out data access flows that keep innovation high and security risks low.