Satori selected as a representative vendor in the Gartner Market Guide for Data Security Platforms →

Access Control,

Data Management,

Data Security,

Databricks,

Satori

Upgrading to Unity Catalog? Here’s What You Need to Know

|Marketing Specialist

Earlier this month, Databricks announced that the Standard workspace tier is no longer offered to new customers of Databricks on AWS or Google Cloud. Furthermore, existing customers have until October 9, 2025, to upgrade to the Premium or Enterprise tier. While this upgrade does mean an increase in Databricks costs, it also brings a variety of new features, most notably including Unity Catalog. For the uninitiated, Unity Catalog is essentially a governance layer for Databricks environments, designed to significantly enhance data management. Its capabilities can be grouped into four main areas: data discovery, governance, lineage, and sharing.

Complex environments containing lots of sensitive customer data require strong access control to prevent problems like privilege creep and role explosion. Unity Catalog is a powerful way to manage data governance and security inside Databricks, and to integrate with external tools to serve additional use cases and data stores. But powerful as it is, Unity Catalog may not be enough for data teams with larger or more complex data environments. These data teams often find themselves still spending unreasonable amounts of time manually managing data access and compliance across their data stack. Moreover, there can be a learning curve involved in adopting Unity Catalog, especially for smaller teams or those with less Databricks experience. 

In this series of posts, we’ll cover Unity Catalog’s capabilities, limitations, and everything to consider if your data team is being affected by Databricks’ latest changes. This post in particular will focus on what Unity Catalog provides in the area of data governance, specifically its fine-grained access control features

Life Before Unity Catalog

Unity Catalog aims to unify all data stored in Databricks environments under a single governance layer. Having a foundational governance layer makes it easier to both manage data governance and security inside Databricks, and to integrate with external tools to serve additional use cases and data stores.

If you’ve worked with data governance in any capacity, you probably know how challenging it is to manage row-, column-, and cell-level security in complex environments. Databricks is no exception, which is why governance was once mostly delegated to external solutions.

Databricks users from the pre-Unity Catalog era might recall dealing with some of these problems:

  • Users, groups, and clusters were linked with workspaces, making it difficult to share data across workspaces
  • Security policies were also managed on the workspace level, meaning that two workspaces containing the same data but different policies had to be managed separately
  • Sharing metadata across workspaces was cumbersome, requiring table redefinitions or external Hive metastores
  • Data access on cloud storage was applied inconsistently, with permissions defined arbitrarily both on object stores such as S3 and Azure, and on Databricks itself
  • Auditing and policy enforcement were challenging due to inconsistent data access

Unity Catalog introduced several features to combat these issues. These include define once, secure everywhere, where policies are administered at the account level and applied across all workspaces, instead of being defined in disparate locations and potentially leaking data. Unity Catalog also brought powerful data lineage, auditing, and discovery capabilities to Databricks environments, which previously required external solutions.

Get the latest from Satori

Access Control Features in Databricks Unity Catalog

In June 2023, Databricks announced row- and column-level security features in public preview, as well as tagging for data classification.

  • Role-based access control (RBAC): Before Unity Catalog, RBAC was implemented at the workspace level, which could often be frustrating for organizations with multiple workspaces. Now, access control policies are defined through the account portal in a metastore, and can then be applied across multiple workspaces underneath it.
  • Row filtering: Allows admins to implement filters with SQL functions, such that specific users can only see rows that return as true.
  • Column masking: Allows admins to apply SQL functions to mask columns of a table, with the option to use other columns as inputs in the function.

Row- and column-level security in Unity Catalog is not to be confused with dynamic views, which are read-only views of the specified table that are named differently. Instead, filters and masks are applied directly to the table, which can then be referred to by the same name in queries.

Here’s a demo by Databricks of their access control features:

Access Control Limitations in Databricks Unity Catalog

It’s important to note that Unity Catalog was meant as a foundational data governance layer, not a full access control solution. Data teams with limited data stores or security and compliance needs may be satisfied with the native access controls provided in Unity Catalog.

But in more complex environments containing additional data stores and sensitive customer data, especially in highly regulated industries like healthcare or finance, Unity Catalog might not be enough. Fortunately, Unity Catalog was built with interoperability in mind, and can be easily integrated with external tools for more advanced data access and governance use cases.

With that, here are some more advanced access control capabilities that are missing from Unity Catalog:

  • Attribute-Based Access Control (ABAC): While RBAC has its uses in an organization, it also has its limitations. Often, roles are not enough to encapsulate the full scope of individual users’ permissions, resulting in either over-privileged access for certain users or role explosion. In this case, it’s helpful to define access groups based on attributes, such as geographical location, for finer granularity.
  • Just-in-time access (JIT): Just-in-time data access limits user access to a data store to a specified amount of time, after which they must re-request access to the data store. This is useful for one-off projects or other cases where someone needs access to data outside of the daily scope of their work.
  • Cross-data store access control: Unity Catalog allows RBAC implementation in Databricks environments across different cloud stores. But what happens when you store data outside of Databricks, for example, in another warehouse? This is where you’ll need to fill in the access control gaps externally.

Conclusion

While Unity Catalog represents a significant step forward in data governance for Databricks environments, it doesn’t fully resolve the challenges of manual data access control and compliance. Many data teams still find themselves bogged down by tedious processes, especially in complex environments with multiple data stores and stringent security requirements. In our next post, we’ll explore how Satori can help solve these problems, offering a more efficient and streamlined approach to data security and compliance.

To learn more about how Satori helps you secure your data with Unity Catalog, book a demo with our team.

Learn More About Satori
in a Live Demo
Book A Demo
About the author
|Marketing Specialist

Idan is a marketing specialist at Satori, with a focus on social media and digital marketing. Since relocating from Silicon Valley to Tel Aviv in 2021, Idan has honed her marketing skills in various Israeli cybersecurity startups.

Back to Blog