Several weeks ago (prior to the Omicron wave), I got to attend my first conference in roughly two years: Dataversity’s Data Quality and Information Quality Conference.
Ryan Doupe, Chief Data Officer of American Fidelity, held a thought-provoking session that resonated with me. In Ryan’s “9-Step Process for Better Data Quality” he discussed the processes for generating data that business leaders consider trustworthy.
Quality policies for data and analytics set expectations about the “fitness for purpose” of artifacts across various dimensions.
– Gartner, “Data and Analytics Governance Requires a Comprehensive Range of Policy Types”1
Most in this space are familiar with data governance rules that enforce compliance. But what about governance rules that enforce quality?
After listening to Ryan, I would contend that in contrast to purely defensive forms of data governance, data quality rules are not focused on role-based access.
Instead, data quality rules promote awareness and trust. They alert all users to data issues and guide data stewards to remediate data from a data source.
Data Governance for Data Quality: Doupe’s 9-Step Framework
Ryan recommends a nine-step framework for managing data quality with data governance. He also points out where data tools can be applied to help with these process steps.
Step 1: Determine list of CDEs (Critical Data Elements)
This requires that you determine the scope of your data quality program. In other words, you must determine the items that should be under control of a data governance program focused upon data quality. This starts by determining the critical data elements for the enterprise. Typically, these are known to be vital to the success of the organization. These items become in scope for the data quality program.
Step 2: Data Definitions
With this step, you create a glossary for CDEs. Here each critical data element is described so there are no inconsistencies between users or data stakeholders. With this accomplished, you move to step 3.
Step 3: Business Impacts
What’s the business impact of critical data elements being trustworthy… or not? In this step, you connect data integrity to business results in shared definitions. This work enables business stewards to prioritize data remediation efforts.
Step 4: Data Sources
This step is about cataloging data sources and discovering data sources containing the specified critical data elements.
Step 5: Data Profiling
With data cataloged, data sources that contain CDEs are then profiled. This is done by collecting data statistics. For example, how many records and rows exist? Minimum and maximum values for data elements? Frequency of data? Data patterns?
Step 6: Data Quality Rules
With profiling complete, you can use a data quality tool to create rules supporting data quality. AI and machine learning can automatically create and enforce such rules, or people can use the data extracted from data profiling to create the rules manually.
Step 7: Data Quality Metrics
With data quality rules completed and firing, you can collect data quality metrics. These metrics inform users of suspect data and alert data stewards to data needing remediation. In Alation, these metrics are added directly into the data catalog, so users discovering data know about any issues in real-time.
Step 8: Determine Authoritative Sources
Determining authoritative data sources is a key output of a data quality program. For this step, quality metrics gauge data sources to determine if the data is of sufficient quality. This allows users to quickly find the most trustworthy data for analysis. They can then collaborate around that data via shared conversations and queries hosted in the data catalog.
Step 9: Data Quality Remediation Plans
Finally, for data with systemic issues, it is important to address the root cause of data quality issues and determine how to solve the source data issues. This can involve data cleansing or training of data entry personnel.
How to Deliver Data Quality via Governance in the Data Catalog
Alation’s system for data governance syncs up nicely with Ryan’s nine-step framework for improving data quality with data governance. Let’s take a look at how Ryan’s system may be leveraged to bring data quality-focused governance into the data catalog.
Establish Governance Framework by determining CDEs, data definitions, and business impacts (steps 1-3, above).
Populate Data Catalog, Empower Data Stewards & Curate Assets by working together to catalog data sources while connecting data assets to business goals; this addresses the need Ryan highlights to profile data. With this step, data is profiled from the selected data sources and this information is used to create data quality rules. (steps 4, 5 and 6).
Apply Policies & Controls to establish data quality metrics to guide user behavior (step 7).
Drive Community Collaboration to use trusted data and integrate human wisdom. Your community can now find and use the best data with help from quality scores. They can also add their own tribal knowledge into the data catalog, further establishing the most authoritative data sources. (step 8).
Monitor and Measure with data quality remediation plans. These are useful in finding repeatable data issues, which will influence how you adapt your data governance framework. It also informs how you clean data and reeducate personnel at the data source within the data catalog.
Implementing Data Quality with Alation
Data governance and data quality are massive undertakings that must work as one. An integrated tech stack, with the right blend of tools, is a key asset for addressing these challenges together. A robust ecosystem of tools enables companies to address a wider range of problems to solve.
Partnerships are a key piece of that ecosystem. They enable organizations to be more strategic and relevant to customers. For this reason, Alation has partnered with leading data quality vendors, BigEye and Soda.
These integrations let us provide a whole product. Here, Alation adds a world-class data governance app and data catalog to our partners’ data quality tools to deliver an integrated data observability platform. Together with BigEye and Soda, we solve customer problems in data quality with an integrated data governance process. In other words, we provide each of the capabilities described above in Alation’s data governance model, but in this case for data quality.
Ryan Doupe has presented a useful guide for establishing data quality with data governance. In this blog, I’ve demonstrated how you might leverage that framework with a data catalog. Alation’s partnership with data quality vendors BigEye and Soda provides an end-to-end solution. With these partners, we provide an end-to-end solution for data quality.