Data Catalog First, Data Quality Second: Here’s Why

By Michael Meyer

Published on May 22, 2024

Data Catalog First, Data Quality Second

Data quality and data intelligence platforms (data catalogs) are essential for organizations to become AI-ready. Data quality and observability have always been crucial for people to trust their data to make critical business decisions. Now, the stakes have been raised with the promise of AI to help drive the automation of business processes and decision-making.

For these reasons, data leaders ask: Should we launch a data quality tool or a data catalog first? There are several key areas to consider that can help guide the decision of which data management solution to implement first.

What Is Data Quality? What Is Data Observability?

Data quality is the degree to which data meets a company’s accuracy, validity, completeness, and consistency expectations. Data quality tasks ensure data is fit for operational activities, analytics, and decision-making in a manner that increases trust. Data observability monitors high-level data aspects like freshness, volume changes, abnormal values (anomalies), structure changes, and quality.

What's the difference? Let's consider an example using pools. In the picture below, data observability tools would indicate one pool has higher algae levels. Data quality tools would state that if the algae level is greater than 5%, the water is of poor quality and needs to be treated before people can swim in it. Whereas data observability flags an issue, data quality mandates how that issue is defined and resolved.

A swimming pool with blue water next to a swimming pool with green water.

What Is a Data Catalog?

A data catalog is a metadata repository of information sources across the enterprise, including data sets, business intelligence reports, visualizations, and conversations. It facilitates a deep understanding of data origin, context, and lineage that helps business and technical users find and understand data more quickly. Data catalogs increasingly address a broad range of data intelligence solutions, including self-service analytics, data governance with privacy, and cloud modernization.

Data catalogs provide employees with an understanding of where their data resides, how to communicate policy information, and how to use data appropriately. This way, data catalogs offer rich insights into various data quality characteristics. Data quality enables people in organizations to make decisions based on trusted data to improve the company's ability to thrive and be successful.

Implementing a data catalog first will enable a more successful implementation of data quality. The reason why can be broken down into these critical areas:

  • Identify and understand the data 

  • Establish a strong foundation for quality initiatives

  • Determine data quality ownership

  • Provide transparency on data quality

A data catalog also offers deep insights into the most referenced and leveraged data. By understanding the 5% of the most used data, leaders can prioritize these critical assets for quality monitoring, saving time and resources.

Furthermore, implementing a data catalog demands that leaders prioritize data, align on a data governance framework, and identify subject matter experts – tasks that create a robust foundation for a future data quality initiative.

Identify and Understand the Data

One cannot improve what one cannot see. Before starting a data quality implementation, organizations need to understand their data landscape. A data catalog provides visibility across all the data assets to understand the data inventory and help evaluate and prioritize the most critical assets regarding data quality.

After identifying the inventory of assets, the next step is identifying the key people who know the data, starting with the top users. The Alation Data Intelligence Platform analyzes query logs to identify the top users. This is a good starting point for determining owners, stewards, and subject matter experts of key data assets.

Screenshot of Alation Top Users

Finding a stakeholder willing to take ownership of a domain is a crucial first step. Experts recommend seeking this stakeholder by domain rather than asset. For example, let's say you find the owner of a key customer table. Does that individual's expertise extend to the entire customer domain? This person will be responsible for addressing customer data quality issues.

This significant identification phase is typically the purview of the Data Governance Council if your company has one. They can guide catalog implementers on ownership and provide additional details around curation and governance policies.

Suppose a company decides to implement data quality first. In that case, they forge ahead without valuable expertise. They will spend more time understanding if they are implementing the correct business rules for the data with incomplete knowledge. In addition, organizations run the risk of targeting the wrong data assets.

From a past experience, two days were wasted because the data asset had been deprecated, but there was no knowledge of this. The data quality rules were returning issues. The data steward tried working with the business users to resolve the data issues through the CRM application. The perplexing outcome was that the business users could not find the problems. After several emails, the data steward determined that the data quality checks needed to use a new table. All of the people’s lost time could have been avoided if there had been a data catalog, like Alation, that supports deprecation flags. The deprecation flag directs consumers to where the new table resides.

Screenshot showing Alation Trust Flags in action

Establish a Strong Foundation for Quality Initiatives

Today, the need for data quality is widely acknowledged. Yet, "quality" is a spectrum, leading some to ask how good is "good enough"? This is where contention can build, as upper management may struggle to understand data quality's value (and ROI). Discussions about customer satisfaction, company reputation, and revenue impact can lead to denial. Data quality as a discipline may also be too broad for people to understand without knowledge of the data or visibility into critical assets.

So, how can data leaders introduce business leaders to the breadth and depth of a data quality initiative? A data catalog offers a robust introduction to the DQ landscape. It provides a valuable starting point for showing that essential DQ initiative planning has occurred. A catalog grants decision-makers a comprehensive view of all data assets. With a catalog, executives can see that critical data elements are being identified, profiles are being made available, and lineage exists to help trace the root causes of data quality issues. Having all this information in a data catalog makes it easier to discuss data quality.

In addition, from an organizational data governance perspective, you need to map the rules of data quality to the policies before you can enforce them. A data catalog supports this rollout, helping to create a smooth process when implementing a data quality/observability solution. Without these items in place first, companies will struggle to get long-term value from data quality.

Determine Data Quality Ownership

Another important decision to make is: Who will own data quality? Will it be IT, the business, or a combination of both?

When determining ownership, leadership must consider who knows and understands (1) the data and (2) the business rules about that data. For many organizations, the subject matter expert could be a business analyst, product manager, or other business role. These same people may already be identified as stewards of the data in a data catalog. Stewards can provide the knowledge and willingness to ensure the data quality they are responsible for.

However, it is essential to realize that the data-asset experts may need help grasping the business rules that govern such assets; stewards may have a lower level of technical skills and need a data quality solution that allows them to apply business rules without knowing SQL or other programming languages. The stewards will also rely upon a collaborative environment with IT teams to help analyze the root cause and resolve the source of data quality issues.

By that same token, If an IT team is going to own the data quality solution, they must collaborate with the stewards to ensure all business rules about the data are accounted for. In addition, the group must talk with the governance team about their implementation and align with the data quality policies of the organization. The last area of consideration is creating a communication plan for the company's data users to keep them informed of data quality initiatives.

Suppose data quality is implemented prior to a data catalog. In that case, it will be challenging to document and communicate the data quality rules so that there is transparency in the entire organization. Those with access to the data quality solution can see and understand the positive impact. Still, the greater population of data users will have no idea this is happening.

Provide Transparency on Data Quality

A data quality initiative can be overwhelming as leaders try to balance rule definition with execution and communication. The critical message to data users is that the process will continuously improve the data quality for business objectives like AI and data-driven decision making.

Data quality must surface to data users at the point of consumption to enable transparency and trust in the data. Alation’s Open Data Quality Framework makes it possible to understand the health of the data before using it to empower good decision-making. Transparency must also extend to self-service tools for data analysis.

Take, for example, a business user who opens a business intelligence dashboard. They could be checking sales numbers and updating business plans based on those numbers. But what if last night’s ETL process corrupted that data? How would the user know? The Alation Data Intelligence Platform extends the quality information to self-service applications with explicit warnings when data quality issues occur. This automated, proactive communication can save substantial amounts of money from avoiding making bad decisions.

If you implement data quality first, will your solution provide transparency, including visibility in self-service applications? From a value perspective, this should be critical when choosing whether to implement a data catalog or quality solution first.

Screenshot of demo customer data source

Conclusion

For Chief Data Officers and other data managers, maximizing data’s potential begins with visibility and comprehension. Only upon this foundation can the nuances of quality be effectively tackled, ensuring data meets organizational standards and drives decision-making processes forward. In essence, a data catalog is not just a preliminary step but a strategic asset in the quest to harness the true power of your data.

While the impulse to address data quality first is understandable, implementing a data catalog should take precedence. A data catalog establishes the foundation for a more effective data quality initiative by fostering a thorough understanding and organization of data assets. From enhancing data literacy to ensuring quality for all business processes, including AI, a data catalog sets the stage for a culture of superior data quality.

Schedule a personalized demo today to learn more about how Alation can help you accelerate your successful data quality implementation.

    Contents
  • What Is Data Quality? What Is Data Observability?
  • What Is a Data Catalog?
  • Identify and Understand the Data
  • Establish a Strong Foundation for Quality Initiatives
  • Determine Data Quality Ownership
  • Provide Transparency on Data Quality
  • Conclusion
Tagged with