We recently published a blog post on data mesh, giving an overview of what’s become one of the buzziest and most controversial topics in the data world.
As a reminder, a data mesh is an approach to data architecture, originally defined in a manifesto by Zhamak Dehghani, proposed to replace the centralized data platform that has become standard today. Data mesh was created in response to common challenges faced by large organizations at scale, where traditional architectures struggle to accommodate growing data sources and consumers.
Data mesh involves four main tenets:
- Distributed domain driven architecture: Distributed teams own data defined by business domains, instead of centralized teams that own all the company’s data.
- Product thinking: Also known as ‘data as a product’, data sets are treated as products in themselves, built and maintained with consumers in mind.
- Self-serve platform design: A separate team builds interfaces and tools to offload the burden of infrastructure building and management.
- Federated data governance: Global governance standards applied to all data products, which are otherwise managed independently by data domain teams.
Those familiar with microservices in software development will see quite a few parallels between these two architectural approaches. In this blog, we’ll be looking specifically at data products, which serve a similar function to microservices in software.
Data mesh isn’t necessarily right for every organization, but it gives a helpful framework for understanding data products. Even if you don’t subscribe to the data mesh philosophy, it’s still valuable to apply product principles to your organization’s data.
What is a Data Product?
Data products are one of the more confusing concepts in the data mesh framework because organizations define them differently. What they all have in common is that data ought to be managed as a product, designed to be consumed by internal or external customers.
Some examples of data product definitions used by different organizations:
- Data product as a single unit that encompasses all of an organization’s data, analogous to a software product like a rideshare app.
- Data product as an asset that, because of the data it utilizes, creates a competitive advantage for the company. This could be internal, like a machine learning algorithm in a rideshare app, or external, like a data analytics platform for businesses.
- Data product as a self-contained component that can be used across business domains to solve an analytics problem.
In this blog we use the third definition to discuss data products, as it is the definition most commonly used in data mesh discussions.
Applying Product Thinking to Data
In a data mesh, data is owned by distributed teams based on business domains, rather than by teams defined by technological specializations. This creates a new challenge: how do we enable the free sharing of data and prevent teams from operating in a siloed manner?
Data products were conceived as a solution to this problem. The idea is that the teams owning data are also responsible for sharing and packaging that data in a way that is usable by other domain teams. What separates a data product from any analytical data set is that it’s fully self-contained, meaning that, for the analytical use case it solves, it contains all the necessary data, the code needed to collect and process it, and infrastructure required to run the code.
A minimum viable data product needs to satisfy a few generally accepted criteria:
- Discoverable: For data products to be fully discoverable, they need to be registered by some central system that can be queried or browsed, such as a data catalog. The onus of discoverability is on the data domain team to register their products with the system.
- Addressable: Users need to be able to access a data product using a unique address, following a global naming convention.
- Trustworthy: Not only do data products need to work with clean, complete, and accurate data, but they need to be able to demonstrate their trustworthiness to the data consumers. Domain teams must define a service-level objective (SLO) for each product that defines their data integrity targets, taking into account the product’s requirements and potential tradeoffs.
- Self-describing: Consumers should be able to figure out how to access and use the data product on their own, without consulting the domain team for help.
- Inter-operable: This is one of the more difficult standards to maintain – data products need to follow global governance rules that allow them to easily interface with each other.
- Secure: Data products need to maintain strong data security practices, especially when sensitive customer data is involved. A critical aspect here is secure data access control, which is defined centrally and applied based on domain.
In addition to all these, applying product thinking to data means explicitly taking time to define the scope of the data domain. Teams plan data products around the needs of their consumers, accounting for user experience, compliance and security, and ease of integration with other data products.
Who are Data Products Useful For?
Data mesh isn’t the right step for everyone – it requires widespread organizational buy-in, data professionals with very broad areas of expertise, and very high data maturity. “Data as a product” is a fundamental component of the data mesh paradigm, but it’s still useful as a standalone concept for businesses with a more traditional centralized data architecture.
Data products can be incredibly useful in organizations where large amounts of data are shared across teams to users who don’t necessarily understand the full context of the data. By packaging data sets in a way that’s user-friendly, discoverable, and accessible to data consumers, teams can make it far easier and faster to generate value from their organizational data.
Bringing it All Together with a Data Security Platform
Governing Your Data Products
The data product framework emphasizes federated governance, where global data governance rules are set by a central team to enable interoperability, or the ability for users to perform operations on multiple data products together. Otherwise, domain teams are granted the autonomy to determine their own governance standards with respect to the unique and changing needs of each product’s data producers and consumers. For example, the global governance team may implement a central inventory, such as a data catalog, to keep their data products and other data assets discoverable. This team defines business semantics and links them to the data catalog or inventory system. Data product owners are responsible for determining data quality, security, and access policies for their products.
For organizations building data products, data security and access management is now governed within the package of each data product, rather than applied across all data products by a central data engineering or DevOps team. Intuitively, this makes sense – data access should be controlled by the people most familiar with the data and its context. In practice, as many data professionals can attest, managing these functions manually is often painful and resource-intensive. This poses the question: how can data access and security be distributed without creating more work for both global governance teams and for each domain team?
This is where a Data Security Platform comes in, helping solve the data governance challenges of both local domain teams and the global governance team. When teams have flexible tools for setting and enforcing security policies automatically, they can quickly adapt to ever-changing users and use cases.
Benefits of a Data Security Platform For Your Data Products
Data Security Platforms help organizations reach the full potential of the data product approach with:
- Fine-grained access control: Inevitably, many different users with different use cases will interact with a single data product. Product owners need to be able to manage access to different components within the product, to different users depending on role, other attributes, or sensitivity of data.
- Dynamic data masking: Domains dealing with sensitive data can set masking policies based on user attributes, which are applied automatically without complex logic or configurations.
- Self-service data portal: Users easily request access and data owners easily grant it, improving the usability of data products.
- Audit, monitoring, and posture management tools: For compliance and auditing needs, centralized governance teams need to keep track of where data is and who has access to it, across all data products.
With Satori’s Data Security Platform, organizations can get the full value out of their data products, without compromising on security or compliance. Satori helps data teams streamline data access by automating data access controls, security and compliance requirements across their data infrastructure.
To learn more about incorporating secure data access into your data product strategy, book a 30-minute call with one of our experts.