Data Catalog: Part of the Solution – or Part of the Problem?

By Mitesh Shah

Published on December 13, 2022

A confused data catalog user looking into their laptop computer while working from home.

What Is a Data Catalog? Who Does It Benefit? How?

Ask a data catalog user these questions, and they will likely use a simple analogy: “A data catalog is like Google meets Amazon for enterprise data.” People come to the data catalog to find trusted data, understand it, and use it wisely. As with Google, use cases and users abound. Today a modern catalog hosts a wide range of users (like business leaders, data scientists and engineers) and supports an even wider set of use cases (like data governance, self-service, and cloud migration).

As a knowledge sharing platform akin to Wikipedia, a catalog captures the wisdom of many types of people (and their use cases) in one shared spot. This broad-lens visibility into the spectrum of data use cases has been a hallmark of data catalogs since inception. And with good reason: when bright people can learn from and leverage the work of other bright folks in previously siloed business units, exciting innovations can take place. (See how European energy enterprise Vattenfall is supporting collaboration across its many operations, or how global insurer Munich Re combined weather with wind-farm data to launch a new service product).

Yet lately, a few analysts have started publishing evaluations of data catalogs for specific use cases. Vendors, too, have launched data catalogs focused on single use cases. But are such evaluations a good thing?

In some ways, yes. These evaluations validate a growing belief that data catalogs can be used for different purposes; they are much more than applications for data search & discovery. To be sure, data catalogs first emerged to help analysts find trusted data more quickly, basically serving as search engines for enterprise data.

Yet they’ve come a long way, incorporating new features to power more use cases. Such vendor evaluations shed light on use-case performance and demonstrate that the market has embraced data catalogs.

It also means buyers are empowered with useful research. If they’re seeking a data catalog for a specific use case, such evaluations offer clear guidance.

The bad news is that such “data catalogs for use case” evaluations inadvertently reframe data catalogs as tools for one use-case – more corkscrew than Swiss army knife. Such content suggests that you need as many catalogs as you have use cases. Some pundits are now suggesting, too, that you need N catalogs for N use cases. So feckless buyers may resort to buying separate data catalogs for use cases like…

  • Data governance

  • Self-service

  • Cloud migration

  • DataOps

How Does This Impact the Buyer’s Journey?

No one catalog supports every use case perfectly, so you’ll see marked differences in what your future users want versus what each catalog you consider can deliver.

For example, the researching buyer may seek a catalog that scores 6 for governance, 10 for self-service, 4 for cloud data migration, and 2 for DataOps (let’s call this a {6, 10, 4, 2} profile). Yet another user may prefer a {0, 6, 6, 8} profile. Knowing which catalogs are strong on what use cases is a good thing for your research – and you should absolutely consider use-case-specific evaluations to inform your group’s choice.

Yet it becomes a bad thing when you take the next logical mis-step and conclude, not that we should identify a catalog use-case profile, but that we should buy four different catalogs, effectively a {10,0,0,0}, a {0,10,0,0} and so forth.

So why would someone want to encourage that? If your catalog implementer is taking a big-picture view of your current and potential future needs around data… they wouldn’t.

However, if you’re a niche data catalog vendor, purpose-built for a single use case – or an analyst who likes them – you may encourage future buyers to say, “Forget the other use cases, we’re buying a catalog solution for our problem alone, and we’ll let the other teams buy for themselves.” This is the breaking point, when lo and behold: You’ve transformed your enterprise data catalog from being part of the solution… to being part of the problem.

The Power of One Platform

There are many merits to supporting multiple use cases within a single data catalog. Chief among them is the ability to align the business with the data to support data driven-decision making. Offering a single, shared place for people to leverage data grounds all data activities in key business context. Users must understand how their work with data impacts the larger business – and a data catalog provides that common ground.

But Why Is Just One Data Catalog the Best Approach?

1. One Catalog Builds Mutual Understanding

First, one catalog builds mutual understanding. In enterprise data landscapes, it’s not uncommon for a term like “revenue” or “profit” to have a different meaning (and value!) depending on who you ask. One data catalog fixes such key terms with one meaning. So all users can see what “revenue” means from both a data and business lens. This mutual understanding builds trust between colleagues.

2. Eases and Fuels Collaboration

It also eases and fuels collaboration. One data catalog supports a broader organizational ability to collaborate (and innovate) across user types, use cases, and business units. As Andreas Kohlmaier, Head of Data Engineering at Munich Re, has learned, one catalog means that “People can now collaborate much more efficiently with each other across the different parts of the company across the globe.”

3. Operationalize Data Governance More Effectively

Finally, one catalog can operationalize data governance more effectively. Today many data catalogs have embraced active governance, which leverages data intelligence. Active governance learns from user behavior, captured in metadata. Machine learning then automates key governance activities like annotations, for example, which flag private data and educate users on how to use it compliantly. A single data catalog that supports active metadata management will capture both explicit and implicit wisdom about data and feed it back into that one platform to the greater benefit of the organization as a whole.

Conclusion

At its best, a data catalog is a platform that supports collaboration – across job types, teams, and use cases. The metadata that springs from these activities, in turn, informs a self-improving catalog. Casting a wide metadata net is important. Bringing people together, across use cases, is important. Innovation and creativity need an environment where ideas can cross-pollinate. Where people of different backgrounds can work together and learn from one another. This the value of one catalog for many use cases.

It’s also emblematic of the risk of a catalog for one use case. Not only do such products create data silos – they perpetuate a broken social system that excludes key stakeholders. Business and non-technical users must be included; technical catalogs exclude them by design. These niche catalogs separate people into data haves and have-nots, and create yet more silos. With time, the functionality of niche catalogs will be enveloped by catalog market leaders… who can help many people achieve many things, faster, more efficiently, and with the support of their entire data community.

Curious to learn more about data catalogs? Download the O’Reilly ebook, Implementing a Modern Data Catalog.

    Contents
  • What Is a Data Catalog? Who Does It Benefit? How?
  • How Does This Impact the Buyer’s Journey?
  • The Power of One Platform
  • But Why Is Just One Data Catalog the Best Approach?
  • Conclusion
Tagged with