Satori selected as a representative vendor in the Gartner Market Guide for Data Security Platforms →

Access Control,

Databricks,

Satori

Data Security for Databricks: A Primer

|Marketing Specialist

Since Databricks was founded in 2013, it’s become an industry leader for big data & analytics, real-time processing, and AI & machine learning workloads. But for a long time, its data governance capabilities were lacking. When Databricks introduced Unity Catalog in 2021, it was a godsend for many data and security teams who now had access to native governance features like row-level security (RLS), role-based access control (RBAC), and data lineage. 

Databricks has been investing heavily in Unity Catalog, with features like attribute-based access control (ABAC), data lake federation, and data classification in private preview. If your organization is using Databricks as a one-stop shop, these updates might be sufficient for your data security needs. But there are a few things you should consider when you’re starting out in your data security journey. 

Do we have visibility into data access and usage?

Databricks Unity Catalog provides audit logs and data lineage tracking, allowing organizations to see which users are querying specific datasets. It also supports some data classification features, such as tagging sensitive data. However, if your organization operates in multiple cloud environments or needs real-time visibility into data access patterns, these features may not be sufficient.

A data security platform like Satori enhances visibility by continuously discovering, classifying, and monitoring data usage across all Databricks workspaces and other data stores (including when using Lakehouse Federation with Snowflake). Satori automatically scans for sensitive data (such as PII, PHI, and financial data), applies dynamic classification, and enables real-time activity monitoring.

Are our security policies consistently enforced across all data access points?

Databricks Unity Catalog allows administrators to define role-based access control (RBAC) policies and apply row-level security (RLS) at the catalog, schema, table, or column level. Unity Catalog also supports attribute-based access control (ABAC) in private preview, enabling organizations to enforce dynamic policies based on user attributes. However, ensuring policy consistency across all data access points—such as Databricks notebooks, BI tools (Tableau, Power BI, Looker), and external applications—remains a challenge.

A data security platform like Satori extends and automates policy enforcement beyond Databricks by applying consistent RBAC, ABAC, and RLS policies across all access methods. Additionally, it enables just-in-time (JIT) access, where users receive temporary permissions based on business needs, and self-service access, allowing authorized users to request and obtain approvals for data access without waiting on IT. This approach reduces operational overhead while ensuring that security policies are applied uniformly, regardless of how users interact with the data.

Are we prepared to detect and respond to data security threats in real time?

While Unity Catalog provides basic access controls and logging, it does not offer real-time security threat detection or automated response mechanisms. If your organization needs to detect unauthorized access attempts, anomalous query patterns, or potential data exfiltration as they happen, relying on logs alone may not be enough.

A data security platform like Satori enhances Databricks security by continuously monitoring data access events and applying advanced anomaly detection to identify suspicious behavior – such as users accessing unusually large amounts of sensitive data or logging in from unauthorized locations. It also enforces granular access controls at runtime, dynamically restricting or blocking access if a security risk is detected. By integrating with SIEM and SOAR tools, Satori enables automated incident response, reducing the time to detect and mitigate threats before they lead to data breaches.

Securing data in Databricks with Satori

Databricks Unity Catalog, which governs access to the different meta stores, requires customers to migrate their metadata to the Unity Catalog before using it. Once metadata is successfully migrated, Satori enables data protection via the Unity Catalog so that any Databricks user can securely communicate with their Databricks SQL stores. 

The following outlines how Satori secures data access for Databricks users. The following sections elaborate on each of these points. 

  • Once a Satori data store is connected, Satori automatically scans, locates, and classifies all sensitive data across all Databricks workspaces and Metastores. 
  • The Satori Administrator and/or the data steward define the security policies/access controls on the management console. 
  • Users access data either through Satori or from Databricks itself with the applied data access controls. 
  • Satori provides the audit logs from all user access.

Now, let’s take a look at how this works in practice.

Get the latest from Satori

Prerequisite: Connect the data store

In Satori, a dataset is a collection of data store objects such as tables or schemas from one or more data stores, that you wish to govern access to as a single unit. In Databricks, these objects are Unity Catalog schemas and tables.

The first step to controlling access is to set up a new Satori data store connection to Databricks. For more details, take a look at our documentation (Azure or AWS).

Once you connect successfully, Satori automatically scans your Unity Catalog for all sensitive PII data and its locations. 

Satori is now immediately available to protect your locations, create access rules and show audit results for this data store. Let’s look at setting up the data access controller.

Data Access Controller

Setting up the data access controller occurs in two simple steps. 

1. Satori data inventory & classification

Satori scans all the data to locate sensitive information and automatically classifies sensitive data based on pre-existing classifiers.

In this example, on the Satori Data Inventory page, Satori automatically scanned a table called people_v2 and classified several different columns. 

Instead of using the predefined classifiers, columns can be updated manually with alternative classifiers. In this example, the data steward wanted a different classifier for name_last so they changed the tag with a Satori Classifier Person Name

You can tell which classifications have been manually updated by the appearance of an icon next to the classifier as seen on Person Name

2. Defining security policies

Before granting access to the people_v2 table, the data steward defines the security policies for the Datasets and users. This step is typically performed within the Satori management console. 

In this case, the data steward defines different masking rules in the Satori management console. 

However, at runtime, the data traffic flows directly between your Databricks instance and your various client tools – not through Satori at all!

How Satori works with Databricks

The data steward applies different security policies and masking requirements for different users to ensure that each user only receives access to the data for which they are authorized.  

In the following example, the data steward enabled access to the people_v2 table. Demo User 1 is given default security access. Based on Demo User 1’s characteristics, they are given read only access. 

In contrast, the data steward provided Demo User 2 with both read and write access. 

So now, when users request access to this data, they receive data with the applicable security policies and masking applied.

For example, Demo User 1 runs a Databricks query, the default policies as defined in the previous section are applied, and the appropriate PII information, email, SSN, and phone numbers are masked.

To run the query, the user can use the native Databricks UX, or any other client or BI tool to connect to the Databricks Unity Catalog, and the experience is identical.

No matter what client tool is used, the rules defined in Satori continue to apply. This example shows the connection through Tableau, with the same table as above. As we can see, the results are the same:

Because Demo User 1 has read-only access, if they try to write back or edit data, their request is denied. 

Satori blocks access to Demo User 1 based on the configuration of access rules already defined for this data location. The implementation of these rules occurs in the Databricks Unity Catalog workspaces as Databricks receives this information and enforces the rules:

Auditing & compliance

To further monitor who has accessed what data, all queries against Unity Catalog are recorded and written back to Satori’s audit log.

The extensive auditing logs provide detailed information about actual data usage so that Databricks users can remain compliance-ready.

Conclusion

Databricks has made significant strides in improving its native security capabilities, but organizations looking for enhanced control, visibility, and automation might need additional layers of protection. Satori’s integration with Databricks provides automated data classification, fine-grained access control, real-time monitoring, and policy enforcement across all data access points. By leveraging Satori, security teams can simplify compliance, minimize risk, and ensure their Databricks environment remains secure without slowing down data-driven operations.

Read more about how Satori brings visibility and control to Databricks here, or book a demo.

Learn More About Satori
in a Live Demo
Book A Demo
About the author
|Marketing Specialist

Idan is a marketing specialist at Satori, with a focus on social media and digital marketing. Since relocating from Silicon Valley to Tel Aviv in 2021, Idan has honed her marketing skills in various Israeli cybersecurity startups.

Back to Blog