The Modern Data Catalog: 'Mythbusting' the Top Four Assumptions
By Paige Bartley
Published on July 21, 2020
Many are familiar with a certain TV show where the premise of each episode was to take several widely held beliefs or pieces of lore, and then put them to the test in a lighthearted and (arguably) scientific manner. At the end of the day, was the belief confirmed, busted or plausible?
Today in the data catalog market, it seems as if we might need some ‘mythbusting’ of our own. Metadata management has gained traction as organizations look to leverage the value of their organizational data assets. In general, organizational IT ecosystems have become more complex and distributed, so it only makes sense that aggregation of metadata across data repositories and sources would help promote a common understanding of diverse informational resources. The data catalog, in many cases, promises to provide the navigation layer and user experience for that metadata, helping stakeholders and workers quickly find the right data for a given use case.
The problem is that this promising vision has created a goldrush of sorts in the metadata management market. Enterprise technology providers want to offer a data catalog, and many organizations want to implement one. This has fomented the perfect climate for rather cloudy messaging and positioning that is rife with potential misconceptions.
We are going to walk through some of those myths. The goal is to similarly use scientific backing – in this case research data – to hopefully bust or confirm these commonly held data catalog notions.
Myth #1: A data catalog is either for governance or analytics, not both
Data governance is the foundation that provides reliable data for downstream analysis. But historically, catalog providers have leaned in one direction or the other for their positioning efforts. Why? It all comes down to perception. Traditionally, data governance was seen as a reactive initiative – one to meet legal or regulatory requirements. Analytics was seen as proactive. But today we see a bridge between those two views. In 451 Research’s survey work with enterprise practitioners, 72% of participants ‘completely’ or ‘mostly’ agree that data governance is an enabler of business value, rather than a cost center, within their organization. Value is driven by supporting proactive initiatives, not just by meeting reactive ‘checkbox’ requirements. A good catalog will support both needs.
Myth #2: Data analysts waste huge amounts of time just looking for data
Enterprise technology vendors, seeking to sell solutions, are willing to throw out seemingly preposterous stats when it comes to worker productivity. The assumption goes that data analysts and data scientists spend so much time looking for data that they simply don’t have time to do the jobs they were hired for. How do you bust a myth? You simply ask the organizations themselves, and 451 Research did. According to survey respondents, data analysts were spending a mean of 48% of their time just finding and preparing data for analysis. Data scientists fared nearly identically. Nearly half of their time wasted on nonanalytical tasks.
Myth #3: A data catalog alone will eliminate the problems associated with data silos
Organizations today have more data silos than ever. With adoption of SaaS apps and the cloud ecosystem, data silos simply continue to reinvent themselves. Centralized storage and management of data is unthinkable. Based on 451 Research’s survey work, 33% of organizations with over 1,000 employees report having more than 50 distinct departmental data silos. Organizational silos – communication barriers and cultural fiefdoms – can exist as well. If an organization has multiple data catalogs (and many do), each connected to a slightly different set of data sources, the catalogs simply become a new abstraction layer for the silos themselves. Even a comprehensive enterprise-wide data catalog may suffer from differing rates of adoption across business units. Silos can’t be solved with technology alone – cultural buy-in is also required.
Myth #4: There can be a single enterprise catalog to ‘rule them all’
Speaking of silos, a catalog is only as useful as the data sources it connects to. In theory, if a data catalog can connect to everything, it can provide a single-pane-of-glass view into the informational assets of an organization, for both analytical and governance use cases. Platform-agnostic, enterprise-wide catalogs aim to deliver this promise. One might think that every organization would aspire to this one-stop-shop approach. The truth is likely somewhere in between. In 451 Research’s survey work, 24% of respondents reported their organization had plans for adoption of an enterprise-wide catalog within the next two years. But data-lake-specific catalogs were also a common aspiration. In any case, the catalog of choice needs to connect to quite a lot of data to provide the best experience.
While we may have been a bit short on explosions and special effects in this mythbusting exercise, the hope is that some common beliefs were challenged, if not actually fully proven or disproven. No two organizations are identical in their IT environments or needs. Choosing or utilizing the right catalog requires an extensive understanding of both the catalog’s capabilities and organizational dynamics. And that is a fact.
- Myth #1: A data catalog is either for governance or analytics, not both
- Myth #2: Data analysts waste huge amounts of time just looking for data
- Myth #3: A data catalog alone will eliminate the problems associated with data silos
- Myth #4: There can be a single enterprise catalog to ‘rule them all’