Stakeholders including data consumers are ultimately people. Having an annoying work environment reduces productivity and over time corrodes work efficiency. This is one of the reasons we should pay special care to actions that can piss off data users.
I decided to compile this list of activities that data owners or data operators (such as data services or data engineering teams) tend to do. Finally, I will also suggest what should be done to reduce these annoyances.
As a data owner in the past and present, I’ve done most of these annoying actions, but I do my best to not do them anymore.
Reluctance To Share Data
In many companies, there is a mindset among data owners not to share their data with other teams. This may very well include data that can provide immense value to those other teams and to the company as a whole. However, the data owners are often not evaluated based on whether they share data and are not recognized for providing such value.
Additionally, sharing data with other teams could introduce risks and in many cases the resources to share this data fall on the sharing team. In other words, the data owners have no incentive to share data, and in some cases the opposite is true, as sharing the data could take away resources from other much-needed projects.
"Hiding" Data That May Help Others
The reluctance to share data often transforms datasets into data silos. The team who owns the data keeps its existence secret out of fear that other teams within the organization will request access to this data. This fear enlarges the size and number of data silos increasing data sharing problems within the organization. Such limited visibility pockets of data may introduce security and compliance risks, as well as operational efficiency risks (for example: several teams spending resources to build similar datasets).
Taking Forever To Provide Access To Data
Let’s assume you’re a data consumer. Despite the above, you found out about a dataset that can provide great value to the company. You even convince whoever has the authority to grant access that you should have access to the data. But then bureaucracy raises its ugly head and the process becomes so complicated and time-intensive, that you give up, or simply become pissed off.
Another very real possibility is that as the data owner, you know that the dataset you want to enable access to may contain sensitive data. However, you don’t know exactly what types of sensitive data and where this data is located, but you do know that it exists. Now the process of granting access necessitates the mapping of that sensitive data, and either giving only partial access, creating an anonymized duplication (that someone will also need to maintain), or setting up granular access controls (such as dynamic masking).
Keeping in mind that the data owner’s motivation may have been low to share the data to begin with, it’s easy to see why the data owner or operator slows this process to an annoying trickle.
Metadata Annoyances
The following are three typical annoyances around metadata (or “data about the data”) that relieving can help new data users use the data successfully and quickly.
Not Providing Metadata
The first annoyance is pretty straightforward. A data owner doesn’t provide the metadata. In some cases this occurs during pre-sharing, not giving the data consumer enough data to understand if the dataset is useful for solving a certain problem.
In other circumstances not providing the metadata occurs after sharing, leaving you to understand (sometimes wrongly) that you are on your own.. And yes, sometimes getting the metadata is via mediation (“let’s sit together”), which could waste everyone’s time, but may be the only avenue that eventually resolves the issue.
Metadata That is Too Technical
Another common issue is that the metadata is provided, but it’s very technical, down to the technical specifications of each data type. The problem with this level of detail is that you can’t see the big picture and are therefore unable to determine the value of the dataset. In some cases the metadata is the same as the information you would get from the data platform’s information schema.
Metadata That is Too High-Level
At the other end of the spectrum, there are data owners who provide metadata that is at too high a level. In this case, the metadata details are an entire document that you can read, but understanding and applying this information is too difficult.
Unexpected Data Access Processes
Another common phenomenon is that of an unexpected or non-deterministic data access process. Data consumers who request access to data in different ways such as, through various IT tickets, e-mails, slack messages, etc. methods, may get different results. For example, using one type of method, access is immediately given, while in another there is an approval workflow that includes the direct manager. Alternatively, using the third method there is an approval workflow that includes the data owner and GRC or security teams. In fact, sometimes access requests made in the same channel, yield different processes in different ways, resulting in inconsistency.
Vague Ownership
Something that contributes to the unexpected data access processes is vague ownership of datasets. The team providing access (for example: data engineering) does not know who owns the specific data you want access to.
Let's Get Better
Data sharing is very important. According to Gartner, Data Sharing Is a Business Necessity To Accelerate Digital Business, 20 May, 2021 data officers who successfully implement data sharing initiatives are 1.7 times more likely to increase their business value and ROI. Therefore, we all need to be better at this game. A good way not to piss off your data consumers with data access comes when the company promotes a “DataSecOps” mindset, where processes are aimed at making data sharing faster yet more secure. A few examples include:
- Data stakeholders should continuously know where sensitive data is located across their data stores. This removes a large unknown that could delay data sharing, and is also beneficial for reducing data security risk, and solving compliance challenges before they arise.
- Metadata should be kept up to date, and in the most automated way possible, including relevant information (such as the sensitivity of the data in the dataset).
- The organization should encourage data owners to share their data (in a secure way), and give them the tools to do this in a simple and effective way by using a data mart or a data portal.
- Data access processes and policies should be simple and transparent. They should also follow the same workflow in every case, regardless of the request channel (for example Slack vs. Support Ticket) and day of the week. A clear policy (such as: data analysts can get access with approval of the data owner and any PII should be dynamically masked) can be both simple and logical, to remove frustrations.
- Having a clear process also helps to reduce the time it takes to get access to data.
How Satori Improves Data Access Bureaucracy
Satori can reduce or eliminate entirely the bureaucracy associated with data access. Here are some examples of how:
- Satori’s continuous sensitive data discovery capability allows the data team and customer to quickly and easily identify the sensitive data.
- With Satori you can apply access and security policies across the different data platforms you’re using, without being limited by their specific capabilities.
- Using Satori, you can set access and security policies in a simple way that does not require writing any database code. This allows the security and data teams to work faster, where no line of SQL needs to be written, thus, the data can be quickly and easily shared with the data customer.
- Satori’s fine-grained access controls can enable the data teams to easily use dynamic masking or row-level security to reduce the security risks associated with sharing data.
To learn more, book a meeting with one of our experts.