How to Scale Snowflake Roles Management
RBAC, or Role-Based Access Controls, are foundational tenets of good data governance and key to keeping sensitive and proprietary information safe in today’s enterprise environments. As regulatory and security pressure surges, the demand for accessible data access controls is reaching critical mass, leading data providers to roll out constant streams of new and improved capabilities that can support growing data governance programs. Snowflake, a world-leading data provider, has led this new wave with a wide variety of native tools and services. Let’s investigate what they have to offer, and explore the pros and cons of Snowflake’s RBAC for enterprise workflows and needs.
Read our post about how Satori helps simplify role management for Snowflake and other data stores.
Here’s what we’ll cover:
Using IAM Groups
Snowflake administrators can assign data ownership to IAM or IDP groups (for example, in the organization’s Okta), which are converted to Snowflake roles. For example, marketing, or at least a subset marketing group, usually has ownership over data enrichment.
At least in theory, this creates an ideal situation where the user’s context is managed by the identity management team (usually in IT), and the access to data is limited according to the groups we already have for the users in our organization.
However, this tool is not ideal for scaling. Today’s consumers often require access to data that is owned by their specific IAM groups. Identity management teams can theoretically add consumers to owner groups when necessary, but this can introduce too many complexities, lead to unexpected results (such as being entitled to other resources enabled by being part of this group) and contradict the baseline principles of identity management.
Data teams may instead try to generate specific ownership groups for the purposes of data access, but that still requires them to correlate each dataset within Snowflake to an IAM group. The only way to achieve this is by data engineering, identity management and data owners miraculously agreeing on how to define identity groups and their respective datasets—a nearly impossible feat when you consider that data is a moving target.
Using Custom Snowflake Permissions
Snowflake’s Privileges controls offer a more viable route for scaled RBAC. Authorization is only half of the coin, data owners must still grant them access. Data engineers typically grant this access by enabling one of the consumer's Snowflake roles.
This enables setting specific roles per project, or for users of a certain domain, not having to rely and delay access until the identity team sets up a new group, and so is usually faster than setting a new identity group.
In other words, this can be as quick as running the following statements by data engineering:
GRANT SELECT ON enrichments.userbase.table1 TO ROLE DATA_SCIENCE; GRANT SELECT ON enrichments.userbase.table2 TO ROLE DATA_SCIENCE;
It is key to note that since access must be meted out through roles, this approach means that access will be granted to anyone else grouped under that role, too.
Data engineers must also consider some drawbacks. After providing access to more than one person, they must maintain a close watch in order to remove it as soon as it is no longer required. Many organizations do not do this, leading to rampant over-privileged data access across enterprise environments and data stores later on. It is also important to keep track of data changes (remember that data is often a moving target) and may require additional configurations to remain optimized.
Building Role Hierarchy
Snowflake’s Role Hierarchy optimizes data authorization by enabling users to create an abstraction layer where roles can gain access to the privileges of other roles. For example:
USER_ENRICHMENTS role has access to the user enrichments dataset, as configured by data engineering.
MARKETING_RESEARCH, as well as DATA_SCIENCE inherit from this role
Only one command is required at the beginning of a project.
GRANT ROLE USER_ENRICHMENTS TO ROLE DATA_SCIENCE
That is true also about the simplicity of revoking the privileges:
REVOKE ROLE USER_ENRICHMENTS FROM ROLE DATA_SCIENCE
The configuration of securable objects (tables, views, functions, etc) can be done for the entire “parent role”—and save considerable precious engineering time and effort in the process.
However, scaling with this tooling will require a careful hand. In many cases, hierarchies can complicate data access flows when introduced without well thought-out planning and strategy. It is fairly common for these flows to collapse under the inevitable chaos and bloat of “unstructured” hierarchies. This is because, over time, layers of roles tend to form, and removing the complexity becomes a very risky task that the team is unwilling to take on. Therefore, this step must be done with careful planning and a great deal of structure.
Adding Roles Per User
Another route for data teams is to create individual roles for individual users, meaning that each user receives their own dedicated role. Doing this allows for more granularity, granting users access to solely what they require and reducing the risk of future overprivileged environments.
Unfortunately, such a high degree of granularity comes with significant overhead, and can quickly spiral in both number and complexity as consumer bases grow.
Open Access: Eliminating Data Access Restrictions
In this data access method, users are granted broad access upon receiving access to the Snowflake data warehouse. While this certainly reduces the amount of data engineering time spent on granting access over objects to users, of all methods this carries the most risk of over-privileging users or even breaking compliance. If this feels too much like deja-vu, we suggest remembering the manner in which security, compliance and privacy looms ever larger for all of us these days.
Building a Self-service Data Portal
Self-service is one of the greatest methods for a scaling Snowflake role management strategy. We even dedicated an entire ebook about self-service data access, a method for enabling (ideally temporary) access to datasets. This can take the form of a well audited workflow, based on a central business process (such as requiring a justification for the data access or a manager or data owner approval).
There are several ways to achieve this, each of which require writing an application that will manage Snowflake roles and users by sending GRANT and REVOKE SQL queries to Snowflake, and keep track of the access granted.
However, this is another costly process with significant overhead. Carrying out a process like this requires resources to go into building out a bespoke self-service platform for the organization, creating a “data mart” for the organization, or working with data access solutions (like Satori!) to enable that.
Minimal Snowflake roles
Dependency on IT.
Requires data access modeling in the IAM.
Data stewards can’t assign permissions.
Data engineering overhead.
No changes to per-role securable object privileges
May lead to “Hierarchy Hell” (complex roles relations)
Role per user
No interruption to data consumers
Security & compliance risks
Self-service data portal
Lightweight for data consumers and data engineering
Requires homegrown development & application maintenance
In summary, Snowflake provides a wide variety of tools for controlling access to data through its built-in RBAC model. Scaling them presents a number of challenges that require careful strategizing with a keen avoidance of more overhead or additional risks. Understanding the pros and cons of each can help minimize both and help data teams build out data access flows that keep innovation high and security risks low.
Recent blog posts
- Democratize Data in AWS Redshift With Self-Service Data Access Workflows
- Zero to Self-Service Snowflake Data Access Management
- Satori Is Launching Self-Service Data Access
- Snowflake Security: Best Practices for Stages
- What it Means to be an RSA Conference Innovation Sandbox Finalist
- What is DataSecOps?
Posts by Tag
- Data Governance
- Access Control
- Data Protection
- Snowflake Data Warehouse
- data security
- AWS Redshift
- data democratisation
- Data Science
- Sensitive Data
- Snowflake security
- self service access control
- Data Masking
- Human Element
- Least Privileges
- Policy Engine
- RSA ISB
- Redshift Security
- Redshift data access
- Row Level Security
- Snowflake Roles
- role hierarchy
- rsa conference
- rsa innovation sandbox
- snowflake stages