As organizations become AI-ready and optimize GenAI, they increasingly rely on large volumes of data. Databricks provides its users with flexible, unified and open analytics for AI solutions. One way to facilitate the quick and efficient use of data is for the data consumers to gain access to data faster to generate the greatest time-to-value from that data.
Data teams are also responsible for ensuring that organizations are compliance ready. This means that all sensitive data, regardless of location, is secured. One way to secure sensitive data is by implementing Attribute Based Access Controls (ABAC).
Databricks offers its customers unified and open analytics for AI solutions, which require storing and using large volumes of data. Sensitive data is often interspersed within the Databricks data lake, however, it is necessary to ensure that all sensitive data is identified and access controlled so that organizations can remain secure and compliant.
ABAC are access controls that determine data access based on predefined roles or user permissions. The attributes can include many characteristics such as user roles, job titles, locations, device type, time of day, and so on. Access to data is then granted and masked accordingly based on the relevant attribute.
Implementing ABAC effectively and efficiently ensures that all sensitive data is appropriately masked. This only sometimes translates across large and growing datasets. Instead, data teams may be tasked with searching and locating sensitive information. However, controlling ABAC permissions across large data lakes, such as Databricks, is difficult and time consuming.
Scaling ABAC on Databricks
Enforcing fine-grained access control policies is complex but necessary to meet compliance and security. As the volume of data and the number of users increases, this strains the data team’s ability to evaluate ABAC policies for every access request. Not only does this significantly reduce the time-to-value for data but it also burdens data teams. Data teams must keep up with the growing number of requests and ensure that attributes are correctly identified.
Managing many attributes, policies, and permissions is difficult, especially at scale. Data teams struggle to design granular ABAC policies that easily scale securely. Satori provides an avenue through which data teams can scale ABAC on Databricks.
Satori’s ABAC on Databricks
Satori easily scales ABAC on Databricks. Using Satori reduces the risk of data breaches or leakages associated with misconfiguring users or permissions. The first steps are to implement ABAC on Databricks:
- Define a Databricks dataset as a Satori dataset.
- Predefine the access rules so that the data engineer doesn’t need to manually grant data access.
- Satori reads the user requesting access and applies instant access rules and policies.
Learn more about implementing Databricks on Satori.
Implementing ABAC in Databricks
1. Check the integration between Databricks and Satori.
In this case, the data steward integrated Databricks Azure data store with Satori.
2. There are two options for classifing sensitive data. Either through Satori’s automatic scanning, locating, and classifing sensitive data based on pre-existing classifiers. The second options is that columns can be updated manually by a data steward.
Read more about how Satori Secures Data Access on Databricks.
3. Before granting access the data steward defines the security policies for the Datasets and users. In this case, because we are exploring ABAC, these security policies are related to user attributes. This step, typically performed within the Satori management console, requires the data steward to define the different masking rules in the Satori management console.
The data steward applies different security policies and masking requirements for different users to ensure that each user only receives access to the data for which they are authorized.
4. Once the attributes are determined they are linked to predefined masking or RLS.
The access rules can be read only, write only or full access. These permissions can be preconfigured so that the catalog is automatically updated.
5. Determine the attribute of the user and connect the user attribute with that from the table.
Masking rules are pre-defined. Satori is continuously searching the data and applying the masking rules.
Scaling ABAC
Satori pulls the user attributes for each user role into our systems, if a certain attribute exists on a certain user, then row level security (RLS) and masking is automatically applied based on the predefined attributes and security policies.
For example when you SELECT from a table. The applicable masking rules are applied based on the user’s location. The user can only view the rows to which they have predefined access.
In this case the filter rule is set up the “State” rule
So that before the filter is applied all States are visible.
After applying the filter for this user all states are shown as “CA”
Access is determined based on user attributes. The records connect the user attribute to the attribute on the table. The main problem with scaling ABAC on Databricks is that attributes constantly change. We need a way to quickly and easily update the attributes to ensure that the rules also update.
It is possible to accomplish this manually, however it is time and resource intensive. Satori provides a way to quickly and easily update ABAC controls on Databricks to ensure that sensitive data is secured.
Updating Attributes
1. There are two possible ways to update an attribute in the management console. The first is the Satori Admin changes the attributes manually in the management console. The second, is that the Identity Provider notifies about the changes on user attributes which are then synched to the management console.
2. The management console then sends a notification to the DAC, that attributes were updated.
3. The DAC pulls the attributes and syncs them to the database.
It is necessary to update and synchronize the attributes in the dataset periodically. Because the dataset has already been defined as a Satori dataset it is easily synchronized and the new attributes are updated based on the preconfigured updates.
Conclusion
As organizations become more AI ready, there is a growing need for secure and efficient data management. Databricks provides a robust platform for AI-driven analytics, but it can be difficult to manage sensitive data within its data lakes, particularly at scale. Satori automates ABAC on Databricks enabling scalable data access management based on user attributes. Ultimately this reduces the burden on data teams, while securing sensitive data.
To learn more about scaling ABAC on Databricks talk with one of our experts.