Data Governance for Dummies: Your Questions, Answered
By Jim Barker
Published on February 17, 2023
Attendance was high, as were the number of excellent questions. In this blog, I’ll address some of the questions we did not have time to answer live, pulling from both Dr. Reichental’s book as well as my own experience as a data governance leader for 30+ years. Enjoy!
Can you have proper data management without establishing a formal data governance program?
Yes, it is possible to have data management without data governance. Reichental describes data governance as the overarching layer that empowers people to manage data well; as such, it is focused on roles & responsibilities, policies, definitions, metrics, and the lifecycle of the data. In this way, data governance is the business or process side. You could visualize data governance and data management as two sides of the same coin; while one side specifies the business details, the other implements the control.
While it is possible to implement just the technical side, you will miss many aspects that lead to real success with data. For this reason, your data program will benefit from bringing data governance and data management together.
Our organization is struggling with the guardrails between the data governance and management concepts. What frameworks and operating models have you seen work well?
The firms that get data governance and management “right” bring people together and leverage a set of capabilities: (1) Agile; (2) Six sigma; (3) data science; and (4) project management tools.
Establishing a solid vision and mission is key. The overall program should set a 2-year vision, mission, and goals, and then focus on execution, measuring progress along the way. The data governance team should define the policies, needed controls & reports and work with the technical staff (who are more data management-centric) to build these out. This is how you can better align data governance and data management by having a ‘one-team’ mindset as you work together, with a shared focus on results.
Assessing roles and responsibilities is also critical. Many shops find that recognizing staff as business data stewards (business resources) and technical data stewards (IT resources) will make all the difference as you build out that partnership.
Be sure to celebrate your shared successes as you work together, focusing less on guardrails and more on common accomplishments.
What are your thoughts on policy ownership? Should data governance own everything? Which policies are critical to be owned by data governance versus, say, info sec (as an example)?
When building out policies, the first thing to do is define how you are going to collect data, and specify ownership. Don’t get too wrapped up in what department owns what policy to start. Consider governance as the collector of policies. Then consider each specific policy and ask, with each team’s remit, “Who should own this policy?” Collect your answers and store them in a centrally accessible place. One big mistake firms often make is storing policies across many tools.
A smart data governance team will start by defining their approach to collection and compliance, then defining the policy on policies. After, assign the writing of the policy, and potentially add an oversight board to approve policies and store them centrally.
How do you get executives to understand the value of data governance?
Seeing is believing! I recommend two things. First, document your successes of good data, and how it happened. Share stories of data in good times and in bad (pictures help!). Your ability to share a narrative with specifics will help to promote the idea of governance as a business enabler; it will also help you align teams around governance to move forward.
Second, think like an executive. What do they care about? Discussing time-to-value, the ROI of good data use, sales growth, and cost reductions are a great set of examples to use and build confidence in your governance program. Executives like pictures, especially those which can be tied back to bottom-line results. In one shop we built out one story for each function and used that to gain support and propel the idea of data governance forward.
IT, at times, may seem to think that they drive data governance. And for good rason: many data governance jobs postings seek skills like Python, programming skills, etc. Yet data governance roles demand much more than programming. A governance leader must align data and business teams – and recruiters would be wise to seek this skillset.
No two organizations look the same! So to say that to be successful data governance has to be owned by IT, or the business is a false narrative. Please be wary of it. Discuss the business problem and the IT capabilities to establish a vision and mission and determine ownership. Some organizations choose to place data governance inside IT or the business and have it in its own team to cross the organizational functional boundaries. In the governance realm, the “business versus IT vs shadow IT” debate has gone on for years, and hasn’t really helped anyone.
It is vital that all parties understand that data governance is a shared responsibility; can’t really be all one way or the other. I suggest building out a RACI framework that assigns core activities across these key roles: (1) Data Owner; (2) Business Data Steward; (3) Technical (IT) Data Steward; (4) Enterprise Data Steward; (5) Data Engineer; and (6) Data Consumer.
Communication is essential. It is vital that IT and business leaders can have grown-up conversations about pros and cons and use that to finalize the vision, misison, and organizational structure to propel your program.
So much focus is placed on protecting and getting insights from data. However, there is less discussion on the need to delete the data when the legal retention policy expires. Why is that?
The idea of data retirement is often overlooked, (and this may be connected to the lack of definitions for the data lifecycle). Why? Perhaps it is because everyone is so busy chasing that next data set, or there is fear of deleting data which might later be needed for something. Yet that fear is short sighted.
For this question, it’s helpful to align on the definition of four key terms:
(1) Data Retention Time – the amount of time that data is retained for. This should be based in policy, defined centrally, codified, and executed;
(2) Data Destruction – the deletion of data in a permanent matter;
(3) Archiving – the removal of data from operational and analytical systems but saved in data archival sources;
(4) Data Retirement – the policy that is put in place and defines the lifecycle of the data and articulates data retention time, data destruction details, and archiving specifics.
It’s important that firms have a defined data policy that articulates data retirement specifics, and is physically implemented in a manner consistent with that policy. This is legally required. If data is not retired, retained, and managed in a manner aligned with these policies there will be different legal challenges.
Today, data retirement is a top priority, as firms establish rules on what data to ingest, use, and retire. The benefits often drive priority, so unless cost structures change, it will be an ongoing challenge to retire, archive, and delete data as we would wish.
In terms of identifying which data should be owned and governed – where do we start? Some data seems more analytical, while other is operational (external facing).
We recommend identifying the data sources and tables that need to be considered to be governed, establishing the governance owner & data quality details, and saving those details in the catalog. Through this process, the identification, classification, and categorization can be completed. This includes defining critical data elements (CDEs), establishing if the data is considered PII (Personally Identifiable Information), and its breadth of usage. This, in turn, entails defining if the data is private, internal with limitations, broadly used internally, or if it can be shared externally.
One of the first things we would recommend doing is to specify the manner in which you are going to classify and categorize data to push this activity forward.
We’re planning data governance that’s primarily focused on compliance, data privacy, and protection. What approach should we take?
A way to look at data governance is seeing it as offensive or defensive. Offensive data governance focuses on documenting data correctly, having it clean, and trusted, and focusing on consumption of data to drive the most business benefit.
A defensive-based data governance approach focuses on the needs of compliance. In a situation where a firm needs to be solely focused on compliance, we would recommend focusing on standards, policies or standards, then developing queries or processes to verify these are being followed, and building out capabilities to show the provenance of data. If you focus on compliance you should have the technical infrastructure in place to simplify audits, document data, and provide a basis for success.
One thing to consider: if you introduce data governance as defensive and compliance focused, at some point you should be prepared to flip the script and move into an offensive posture. That is, if you establish defensive governance well, you will have all the pieces in place to become more aggressive on the offensive side of governance. This is because you will know what data you can trust, and you will have processes to create and upkeep data, as well as curated metadata to exploit data’s full capabilities.
Again: If you build out your defensive strategy first, you will later have the ability to democratize data, promote its use, and maximize the return on your data investment.
Many organizations are made up of disparate companies; how can such large entities migrate changes in data definitions across the organization?
Many large organizations are composed of many affiliated businesses, different operating models, and vastly different data definitions, policies, standards, and data access requirements. Due to this, glossaries and policies can be very different. Most firms in that situation still need to get some key terms defined in a common way. They may leverage a business-unit glossary and an additional “Enterprise” glossary to drive very tangible benefits. These firms may also benefit from a domain structure. This structure sorts data by category, defining differences, allowing for overlaps, and providing flexibility in discovery.
So, when it comes to definitions… “It Depends!” Any large business with multiple strategic business units needs to have a way to both (1) provide consolidated reporting across common dimensions with common definitions in order to (2) run a complex, independent business but also to roll-up to the enterprise-level metrics. By creating a separate glossary for each independent business, you can continue to allow for that independence and fluctuation where it makes sense.
That said, a single enterprise definition provides commonality in corporate reporting. This is a very good thing. Some software packages will do this with a glossary of terms, and other software packages may do this with a domain structure, and some may even have a fully segregated data model, which may be a little tougher to work with depending on the architecture. The key is to use the software you have in a manner that allows segregation of terms & definitions but also allows you to bring back enterprise wide definitions.
Can you differentiate between governance of raw data and enhanced data (information)? Where do you govern?
It is not uncommon, particularly with data lakes, to have different data stores and degrees of transformation. This is the idea of having data at a raw, semi-transformed, and consumption-ready level. So, establishing a framework to store data by its source is a great place to start. The idea is to have source systems, or schemas, to a level that the data exists – and have metadata definitions about the data in each schema that ties to its level of abstraction.
Here’s an example. Data from an ERP or MDM system could come in as raw, with technical names that match the source, and have some limited data definitions but no data quality aspects. That data could be further transformed through a DQ data pipeline layer, and be transformed and consolidated to prepare it for data consumption, which may mean adding metadata, perhaps renaming fields, and replacing old with new column names. In the final consumption layer, the data fields could be tagged for governance, PII specifics, and advanced classification and categorization. The data would also be tagged for data trust, with labels that permit data quality rules to be executed.
So what’s the outcome of data governance at the consumption level? Data analysts would have all that they need to build new reports, and take action on business results without having to know what the data definitions are at the source. This governance flow does a few things:
(1) It provides the necessary metadata to make the data usable;
(2) It defines the data that will be governed and limits how many places the same data needs to be governed and cleansed;
(3) It provides the details that resonate with data analysts and reduces the time to value of the new data product.
Do testing companies use data governance tools? If so, to what capacity?
Yes, testing companies use data governance tools. If you really dig into what testing companies do, it is all about executing QA and software tests and saving data. You need to have those tests defined, the data defined, and documented policies regarding how you are executing those tests to ensure validity of each test, reliability of how you are testing, and dependability of execution. To that end it is critical that you not only define the data collected, but more specifically how are you meeting the expectations of the testing client or team.
What is the right size for a governance committee to ensure efficiency and create good data definitions? Do you believe these committees should be created at the individual stewardship level?
The size and make-up of a data governance council or committee fluctuates by organization. It is a good idea to have multiple levels, but your organizational context will largely define the right organizational structure. We normally think about three levels of councils: (1) Tactical; (2) Steering; and (3) Executive.
The most successful, data-driven organizations are those where executive leaders expect progress in data, want updates on the successes with data, and provide input. The benefits are also enhanced in the organizations which share experiences across business units, functions, and geographies. The biggest tip we share is Always include someone from the CISO or the data security office in tactical, steering, and executive councils.
A tactical council will bring data stewards together to share challenges, successes, and future planning to build consensus. Having the leaders of the tactical governance council convene and provide steering guidance on a monthly basis can be very helpful. It brings together a level of oversight and guidance and helps to build support for the data stewardship activities. We recommend that the steering group meet monthly, the tactical council weekly, and the executive council quarterly.
Data quality: How does data governance contribute to data quality? I understand lineage is a good use case. Are there other use cases where data governance tools help to improve data quality?
This is a great question, and I’m going to address it along with this question: Are there any differences in ensuring data quality and data governance in organizations that use data as a service model?
Data quality is one of the key pillars of data governance. In my work as a governance consultant, we would tackle quality through the following process.
First, we’d focus on data quality or “business fit”, look at governed fields in a critical manner, and build out a set of rules or checks to identify what fields are missing that “business fit”. We would then categorize each of the rules based on their data quality dimension and report out the summary of rules that fail by data quality dimension, and continue to monitor and focus on reporting and improvement. This action automates the review of data quality across criteria related to uniqueness, conformance, or completeness as examples. We’d then use the results of this roll-up to show the data quality trend, as a function of the benefits of the overall data governance program.
Additionally, we recommend data leaders use profiling to continue to review the format of the stored data. The ability to see data patterns helps to expedite data cleanup, as stewards can review the aggregate functions of key columns; this enables them to spot patterns and understand data at scale.
Last, build out capabilities within your data governance program to get people to identify data they can and can’t trust. Get everyone from data stewards to data consumers to speak up and say something when the data is wrong. Don’t tolerate the generalized moans about “Our data quality is bad!”. What’s bad? Why? You must force folks to give specifics so the data stewards can do something about those challenges!
While lineage might be viewed as a bi-product of data quality, it really shows the impact once you determine what is good data quality. The ultimate goal is to have critical data elements that can be trusted, and when they can, everyone knows it and we fix it as a team.
How do you measure the impact of a DG/DM program?
Measuring the impact of a Data Governance and Data Management program many would see as a daunting challenge. We would disagree!
When you’re just getting started, you can measure the overall progress of your program with metrics like:
What % of tables have a data steward assigned?
What % of tables have a definition?
Does the top 5% of data used in the organization have data owners, stewards, titles, definitions, and classifications?
As you progress, and govern more of your organization’s most utilized data, you can start to track ROI through metrics like:
How much faster can analysts find data?
How much faster can users understand data?
How much faster can users query and analyze data?
Demonstrating your progress here will go a long way to building executive confidence in your program.
Security is another lens through which you can measure efficacy. Demonstrate the number of security requests, security changes, and threat intrusion attempts. Highlight the metadata you’ve stored on data pipelines, executed queries, etc, and summarize the number of questions asked, tasks completed, and needs addressed by your program. These reports (combined with updates to the data governance roadmap, and your progress narrative) tell a great story. Tell that transformation story, and celebrate the successes to promote what is being done.
Thank you for your interest in Data Governance, and the “Data Governance for Dummies” webinar with the author!. This book is a great introduction to data governance for many, as it offers practical, actionable advice for first-time governance leaders and helps you to launch your program.
Data governance is not child’s play, likewise it provides an incredible business benefit to companies that focus on execution of data and make it work. My final recommendation is to take the time to establish a framework for data governance. This foundation will empower you to be successful with governance, putting in place the right councils to bring people together, and communicating the value to generate growing support.
Data governance is a people-centered set of activities. It takes a community! At the end of the day, take the time to engage with the right people, work together and celebrate the success you’ve worked so hard to earn.
Curious to hear from the author? Watch the Data Governance for Dummies webinar on demand with Dr. Jonathan Reichental today!
- Can you have proper data management without establishing a formal data governance program?
- Our organization is struggling with the guardrails between the data governance and management concepts. What frameworks and operating models have you seen work well?
- What are your thoughts on policy ownership? Should data governance own everything? Which policies are critical to be owned by data governance versus, say, info sec (as an example)?
- How do you get executives to understand the value of data governance?
- So much focus is placed on protecting and getting insights from data. However, there is less discussion on the need to delete the data when the legal retention policy expires. Why is that?
- In terms of identifying which data should be owned and governed – where do we start? Some data seems more analytical, while other is operational (external facing).
- We’re planning data governance that’s primarily focused on compliance, data privacy, and protection. What approach should we take?
- Many organizations are made up of disparate companies; how can such large entities migrate changes in data definitions across the organization?
- Can you differentiate between governance of raw data and enhanced data (information)? Where do you govern?
- Do testing companies use data governance tools? If so, to what capacity?
- What is the right size for a governance committee to ensure efficiency and create good data definitions? Do you believe these committees should be created at the individual stewardship level?
- Data quality: How does data governance contribute to data quality? I understand lineage is a good use case. Are there other use cases where data governance tools help to improve data quality?
- How do you measure the impact of a DG/DM program?
- In Conclusion