There is no such thing as a one-size-fits-all data governance framework that works for all organizations. However, one idea applies universally, regardless of an organization’s scale or industry: having well-defined roles and ensuring that all stakeholders understand the overlaps and differences between those roles is crucial for the success of any data governance initiative.
Let’s simplify your path through the data governance maze.
Given how important data governance is, we will demystify the confusion surrounding the different roles central to data governance in this post. We will take a look at examples of how these roles may look in practice across varied organizations. And most importantly, we will examine why this information is so essential and why you should care.
In this article, we will discuss:
First, let’s provide some context:
Most people think that the phenomenon of data becoming a valuable resource for organizations (and being widely viewed in that light) as a relatively recent development. After all, the term “big data” was only coined in 2005. So, it is easy to forget that the implicit recognition of the value of data is at least as old as civilization itself, if not older.
As early as 40,000 years ago, ancient tribes maintained “tally sticks” to store and analyze data about food and harvests in order to predict how long their supplies would last. Hammurabi is known to have collected detailed statistics about enemy troop movements and the strength of their forces, often refusing to deploy his troops without having this data.
That said, indeed, the sheer scale of data that we are working with today dwarfs everything that came before by several orders of magnitude.
Consider an example from the realm of medicine—specifically cancer. One of the most promising breakthroughs today in treating cancer is the ability to map the genomes of cancer cells to identify mutations and determine the most appropriate treatments. As Dr. Heath explains in her brilliant TEDx Talk, the entire genetic profile for a single cancer patient can amount to about one petabyte (a little over one billion megabytes) of data.
To put this number in perspective, a typical photo you take on your phone is about six megabytes in size. To get to one petabyte worth of photos, you would need to take 178 million photos, or about 9,805 photos per day for the next 50 years.
And that’s just the data for one patient. Annually, there are an estimated 19.3 million cases of cancer worldwide on which we collect data today.
While data of this magnitude can be astonishing, an important detail to note is that data is only useful to the extent that it can be effectively managed such that professionals can store, access, and analyze it as necessary and do so in a manner that takes full advantage of the high processing speeds possible with modern technology.
In other words, the method used to manage the data should not limit the data’s potential, which, in this example, presents literal life-or-death stakes.
Is Your Organization’s Data an Asset or a Liability?
Data may play a very different role in your organization than in the example above, but regardless of its specific purpose, most business executives now agree that data is among their organization’s most valuable resources and that it is important, if not essential, to business success. Even so, it remains relatively common for organizations to operate without good Data Governance practices.
Organizations often have a lot of data, but it is not well documented or standardized, so they lack knowledge about the information they have. Or, even when they do know, they encounter barriers to finding or accessing the appropriate data when they need it (which is worse than not having the data in the first place because you pay the cost for data collection and storage but do not reap the benefits of it). Further, when organizations can find their data, they are often not entirely sure whether it is reliable enough to use.
We have all laughed at anecdotes where an 85-year-old retiree receives promotional flyers inviting them to explore the back-to-college collection because the sender could not get their data right. However, things are not as amusing when the wrong person gets a traffic ticket or a summons to appear in court because of biased data. They are definitely not funny when they involve data exposure, especially of a private nature.
For organizations in the modern regulatory environment, poor data governance can transform data into a severe liability, rather than an asset, exposing the business to crippling or severe privacy penalties.
The moral of this story is that data can only be valuable when we know how to use it, manage it properly, and give it the respect it deserves.
As an organization, adhering to this standard involves having a comprehensive and well-established protocol in place on how to manage data and, equally importantly, having a team of people who understand their specific roles and responsibilities in implementing these practices.
If you have researched data governance implementation in the past, you have surely already come across many roles, ranging from the mundane-sounding managers and librarians to the exotic “ambassadors” and “champions.”
Here are the three most important roles that any organization needs to understand in the context of data governance:
It’s worth noting that it is seldom the case that any of these Data Governance roles represent a distinct, exclusive job title. In most cases, you are not going to be hiring a person into a new position. Rather, your existing team members will take on various data governance responsibilities, but these are the terms used to describe those different sets of responsibilities.
Here is a quick overview of each role before we examine what they look like in practice across organizations of various sizes.
What Is a Data Owner?
A Data Owner is the person accountable for the classification, protection, use, and quality of one or more data sets within an organization. This responsibility involves activities including, but not limited to, ensuring that:
The organization’s Data Glossary is comprehensive and agreed upon by all stakeholders
A system is in place for auditing and reporting data quality
An escalation matrix is in place for data quality issues
Actions are taken to resolve data quality issues within a defined timeframe
Most Data Governance experts maintain the view that there should only be one Data Owner for a given data set. In cases where multiple stakeholders are concerned with the same set of data, it is important to designate one individual who will assume the Data Owner role, and then they may consult and collaborate with other stakeholders as closely as necessary.
To fulfill the obligations listed above, a Data Owner needs:
The authority to make any changes required in terms of workflows, practices, and infrastructure to ensure data quality
The resources to initiate actions for ensuring data quality, such as data cleansing and data audits
In practice, this means that the Data Owner role has to be assigned to someone relatively senior, typically in upper management. Without adequate authority and access to resources, a Data Owner will be ineffective at fulfilling their role, and this shortcoming cascades down the entire Data Governance chain, defeating the whole initiative.
However, most senior management figures do not necessarily understand the finer technical details about a data set or its management. They are also almost always constrained for time, meaning that they cannot realistically implement all of the processes required for a Data Governance framework to be effective.
That’s where Data Stewards come in.
What Is a Data Steward?
A Data Steward is a subject expert with a thorough understanding of a particular data set. The Data Steward is responsible for ensuring the classification, protection, use, and quality of that data, in line with the Data Governance standards set by the Data Owner.
To understand the meaning of a Data Steward, remember that “subject expert” does not necessarily mean they come from an IT background. Depending on an organization’s data and business nature, a subject expert might have experience in business, operations, IT, or a project-specific function.
Typically, the Data Owner appoints a Data Steward. Depending on the scale of an organization and its data, one or more Data Stewards may be appointed to assist the Data Owner in implementing the organization’s Data Governance policies.
What Is the Role of a Data Steward?
Data Stewards play a crucial part in ensuring that the data in their care is of high quality and is fit for use by all data stakeholders in the organization who are concerned with that set of data. Some organizations also describe this role as a “Data Quality Steward.”A good Data Steward must have the ability to see beyond silos and implement rules and processes for the data under their care. Although they do not own the data, they must thoroughly understand how that data needs to be documented, stored, and protected.
As David Plotkin explains in his book, Data Stewardship: An Actionable Guide to Effective Data, there are four distinct types of Data Stewards:
Business Data Stewards
Operational Data Stewards
Technical Data Stewards
Project Data Stewards
As we mentioned earlier, each of these specific roles refers to that individual’s functional background in the organization. When an organization requires different types of Data Stewards, or multiple Data Stewards of the same type, for a common set of data, they must often work together to ensure effective Data Governance.
In many cases, a Data Steward may not necessarily have the expertise to manage the data’s storage, retrieval, and formatting. This brings us to our next role: the Data Custodian.
What Is a Data Custodian?
A Data Custodian is responsible for implementing and maintaining security controls for a given data set in order to meet the requirements specified by the Data Owner in the Data Governance Framework.
The Differences Between Data Governance Roles
Role titles are useful because they allow individuals both within and outside an organization to quickly get a sense of the role’s responsibilities. Unfortunately, because data can be quite abstract, there is a lot of confusion surrounding the titles of the different roles associated with Data Governance.
Let’s uncomplicate it.
Data Owner vs. Data Steward
Given that Data Stewards are appointed to assist a Data Owner in implementing the Data Governance policies, there is a fair bit of overlap between their profile descriptions.
So What Is the Difference Between a Data Owner and a Data Steward?
A Data Owner is accountable for Data Governance outcomes, whereas a Data Steward is responsible for the Data Governance tasks required to achieve those outcomes. In other words, the Data Owner role is results-focused, while the Data Steward role is task-focused.
For instance, a Data Owner might be accountable for data excellence metrics, such as audit findings and quality scores. They may also be accountable for business metrics, like the impact of Data Governance on strategic goals — such as the quality of customer data and the effect it has on the success of a direct mail campaign for example.
By contrast, a Data Steward might be responsible for ensuring that all items on a Data Governance checklist are implemented and that problems in implementation are prevented and/or resolved in a timely manner.
Does Your Organization Need Both Data Owners and Data Stewards?
Whether your organization needs both roles depends on the scale and scope of your Data Governance program. Large organizations most likely need both roles, while, in smaller businesses, the Data Owner and Data Steward can be one and the same person.
Data Owner vs. Data Custodian
A lot of people confuse Data Custodians with Data Owners. This misconception probably arises because Data Custodians are often the ones physically or directly handling the storage and security of a data set. But just because data is stored on a device controlled by someone does not make them the Data Owner.
The data may be in their drawer, but that doesn’t make it theirs.
A good way to think about this is in terms of money in a bank. When you deposit your money in a bank, just because the money is stored in the bank’s vault does not make the bank the owner of that money!
So What Is the Difference Between a Data Owner and a Data Custodian?
A Data Owner is an individual, usually in a senior business role, who is accountable for the classification, protection, use, and quality of one or more sets of data. A Data Custodian is typically someone in an IT role who is responsible for maintaining the storage and security infrastructure for one or more data sets in a manner that meets the requirements of the organization’s Data Governance policy.
In small organizations where the roles of Data Owner and Data Steward may be held by a single individual, the Data Owner is likely to directly delegate day-to-day tasks (e.g. backups) to Data Custodians.
Real-World Examples of Data Steward Roles
To understand how Data Stewardship plays out in practice, let’s look at a couple of real-world examples of these roles in different organizations.
Data Stewardship in a Retail Chain
A high-end retail chain lets customers participate in a sweepstake by dropping their business cards in the contest boxes located in each store. By providing their personal data and participating in the contest, customers consent to receive the chain’s promotional marketing emails.
Starting from the bottom-up, in this scenario:
A back-office employee collects and manually records each customer’s data in the company’s database. This individual is not a Data Owner, Steward, or Custodian, but rather they are simply a Data Creator.
The customer data is stored on a cloud server, and an IT administrator is the Data Custodian who must ensure the data is secure and accessible only to authorized personnel.
A person on the digital marketing team is responsible for cleaning and validating the data set before using it in email marketing campaigns. They are appointed the Data Steward, responsible for ensuring the quality of email marketing data through systematic formatting, cleaning, and enriching procedures as specified by the Data Governance policy.
The Head of Sales is accountable for sales targets and is very invested in the success of marketing campaigns. They are designated the Data Owner for this data set because they are in a senior position with insight into the organization’s goals andhave the authority and resources to make decisions to improve data quality and security (e.g. by investing in technology to automate data capture and digitization or by enforcing authentication safeguards to allow access to the data).
Data Stewardship in a Manufacturing Business
In a contract manufacturing company, the Production Manager is designated as the Data Owner for all production data. In turn, the Data Owner appoints several Data Stewards as follows:
Production Shift Supervisors were Data Stewards for material usage, cycle time, and part output data
Maintenance Engineers were Data Stewards for machine performance, availability, breakdown, and time-to-repair data
Production Planners were Data Stewards for utilization and efficiency data
The Quality Lead was the Data Steward for defect and rejection data
Each of these Data Stewards is responsible for the quality of the data in their care, including its capture, storage, security, and availability for concerned stakeholders.
It’s important to note that this structure will not necessarily work for all manufacturing companies. Even when the different stewards are competitors engaged in the same activities, their business goals and internal processes are likely to be quite different, which may require a significantly different map of Data Governance roles.
Also, in this example, the Production Data Stewards, Planning Data Stewards and Maintenance Data Stewards all need access to data that is generated by the same set of machinery.
But this data is captured and stored in a local server, which is operated and managed by the organization’s IT department. An individual in that department is appointed as Data Custodian.
Does It Really Matter What They Are Called?
In some organizations, people still find themselves confused between role titles, despite having clear definitions in place for each Data Governance role and its respective responsibilities. There may even be resistance within an organization to some titles.
In such cases, it may be more productive to change the role title to whatever people find less confusing and more acceptable. Ultimately, it doesn't really matter what each person on the Data Governance team is called — as long as there is clarity across the organization on what needs to be done and who is supposed to do it.
Data on its own does not solve problems or add value; effective management and application of data does.
Unsystematic approaches to managing data can quickly turn data into a liability for an organization, rather than an asset.
Properly leveraging data as an asset and implementing measures that benefit the enterprise requires support, buy-in, and involvement at the executive level.
To fulfill their job functions well, many employees who use a data set in an organization are dependent on others further upstream to process the data correctly, which cannot be ensured without well-established Data Governance practices.
A key requirement for effective Data Governance is to implement a system with transparent roles and responsibilities and clear definitions about:
Who is allowed or obliged to take which actions
What specific data sets they are allowed or obliged to act on
When (i.e., in which specific situations) they are allowed or obliged to take such actions, and
What methods they are allowed to use
While data can be a resource shared by several stakeholders, accountability for Data Governance is never shared: it is solely the Data Owner’s responsibility. Data Stewards may have some overlap in responsibilities, but these need to be defined with clear matrices for escalation in the event of problems.
High data accuracy and strong data management is a team effort. Managing data with an inclusive approach and distributing responsibilities across traditional boundaries allows for superior data quality.
Better data quality presents opportunities for improved analytics and increased business exploration.
How Satori Helps Data Owners, Data Stewards, & Data Custodians
Satori enables the “data masters” of an organization to enable access to data without requiring any help from IT or Data Engineering teams. In addition, they can each tag and describe their data sets, even when it is scattered across several data platforms. Further, Satori enables continuous sensitive data discovery so that these professionals know exactly when new sensitive data is introduced. Finally, Satori enables them to create security policies, including fine-grained security policies, without the need for implementation by data engineers.
Ben is an experienced tech leader and book author with a background in endpoint security, analytics, and application & data security. Ben filled roles such as the CTO of Cynet, and Director of Threat Research at Imperva. Ben is the Chief Scientist for Satori, the DataSecOps platform.