In a previous post, we introduced you to our data classification capabilities. As a recap, Satori not only performs automatic data classification, but it also allows you to add your own custom classifications. By adding custom classifications, you can use Satori to locate sensitive data specific to your business. You can use the tags added to the data for reporting, analytics, and security policies.
In this article, we are going to dive a bit deeper into the world of custom classification. We will examine some of the business use-cases that custom data classification can simplify. We mainly have our customers to thank for this information, as many of these use-cases were provided by them.
We are going to discuss the following use-cases:
It is common to take regional properties of data subjects and other data entities into account. For example, we often see cases where our customers map certain user groups to only be able to access data from a certain geographic region. This classification is mostly done for compliance reasons, but it is also sometimes done for contractual reasons. Depending on the exact data architecture, this configuration is mostly performed with row-level security where a certain column contains the geographic location.
With custom tagging, an organization can identify specific regional locations, even if they are unique to that organization. For example, customer regions may be determined with a column given a specific name (such as rgn_id) or a value that only makes sense to the data teams (e.g. CT_1000074 is London, and CT_1000074 is Brussels).
By applying custom tags, you can discover where your custom regional information is located within a database. You can either use this information for visibility, or you can further use this to create row-level security on such tables.
As you can see in the image above, a simple RegEx allows us to identify our internal city codes.
Localized Data Identification
Satori detects out-of-the-box a large amount of data types, such as zip codes, social security numbers (SSN), payment cards, and more. However, for customers who have data that is highly localized, the detection may not recognize certain data types. As a countermeasure, you can set custom classification for such localized data.
For example, if you want to identify a specific type of license plate number that follows a certain pattern, you can configure it as a RegEx (Regular Expression) based custom classifier. Another example is when your data contains column names which include clues about the data types, and you would like to set classification based on that information. In this case, whenever Satori detects a column named, for example, “placa” (license plate in Spanish), it will automatically detect the location as one containing license plates.
As you can see in the image above, a localized column name will now be identified as a license plate.
Tagging Data Items Based on Business Units
Let’s assume that there is a cloud data warehouse which holds information. Part of what it contains is personal information. This data, though personal in nature, may belong to different types of people, including employees, customers, and vendors. In this case, it may make sense to add a custom data classifier to detect contextual information that may help us understand the type of entities across our data warehouse.
For example, you can set custom classifiers to identify data like “employee ID,” “vendor ID,” and “customer ID.” These data types may be identified by certain column names or by specific patterns that describe the data.
Following this identification, you may use this information for better governance (e.g. knowing when a dataset that should only contain employee details contains data about customers or vendors). You can also set security policies accordingly that ensure that queries containing such data do not expose sensitive data to unauthorized users.
Custom Classification of Sensitive Business Data
If all businesses were doing the same things and conforming to the exact same standards, we would have a very boring world and no innovation. In reality, there are usually specific types of data that are present in any given business and are unique to that business. Intuitively, it still makes sense for the organization to monitor and control access to such sensitive data.
For example, a marketing campaign data warehouse may hold serial numbers of coupons, along with the discounts or benefits that these coupons entitle their holders. The organization may want to redact all occurrences of coupon serial numbers for fear of “foul play.” To do this, they can set up a custom classifier to identify occurrences of coupon numbers and redact them.
Data Catalog Integration
Typically in larger organizations, there is a data catalog which acts as a single source of truth for the data stored throughout the company’s assets. A company can integrate this information so that tags for assets are inherited into Satori, and thus security policies can apply to such data. They can also integrate the platforms so that classifications from Satori will propagate into the data catalog to enrich that “single source of truth.”
However, depending on the organization’s data governance processes, it may be a better option to set custom classification on some of the data that is discovered and tag it as a “candidate” for user verification. For example, whenever a certain location has a pattern containing a certain template, it will be tagged as a candidate for having a credit card number. This information is then propagated to the data catalog and triggers a certain validation process.
Conclusion
Custom classification can be used in many different ways to help you better describe your data, and this is by no means a full list of use-cases. Keep in mind that custom classification is a relatively advanced capability, and,t in most cases organizations start with our out-of-the-box sensitive data discovery. However, such a capability extends the options of what you can do with Satori, which enables you to simplify and secure access to data. You can learn more about Satori here, or if you would like to arrange a demo, fill out the form below.
Ben is an experienced tech leader and book author with a background in endpoint security, analytics, and application & data security. Ben filled roles such as the CTO of Cynet, and Director of Threat Research at Imperva. Ben is the Chief Scientist for Satori, the DataSecOps platform.