Data Catalogs: A Category of Their Own

By Stephanie McReynolds

Published on February 20, 2020

Hero image from an Alation blog about the Forrester Wave: Machine Learning Data Catalogs

Data catalogs are here to stay. This week, two independent analyst reports validated what we’ve known for years – data catalogs are critical for self-service analytics.

After investing in self-service analytic tooling, organizations are now turning their attention to linking infrastructure and tooling to data-driven decisions. While this requires technology – AI, machine learning, log parsing, natural language processing,metadata management, this technology must be surfaced in a form accessible to business users – the data catalog.

Here’s why your organization should catch the Wave.

The Forrester Wave™: Machine Learning Data Catalogs, Q2 2018

This is Forrester’s inaugural Wave on data catalogs. Analyst Michelle Goetz, a well known advisor to enterprise architects, chief data officers, and business analysts, has been tracking this market for some time. She’s seen the evolution of the self-service analytics market from decision systems to business intelligence to data visualization to data science and automated intelligence. And, now she sees a need to make data more accessible:

For EA professionals, relying on people and manual processes to provision, manage, and govern data simply does not scale. Enterprises are waking up to this fact and turning to data catalogs.

A New Market Category

What did Michelle Goetz observe — with more than 20 years of experience in data analytics — that lead her to believe that data catalogs are an independent market category?

Her report states that organizations relying on manual processes to provision, manage, and govern data simply can’t scale. “The growth of data is outpacing organization’s ability to get value from it.” And turning machine learning onto the problem has produced a new market category – Machine Learning Data Catalogs (MLDCs).

According to the report, MLDCs are becoming increasingly valuable for organizations implementing self-service analytics:

Combining ML with collaboration and activation scales out data understanding and speeds up use. Thus, MLDCs are demonstrating ROI in many cases within four weeks.

The Forrester report looks at 29-criteria to evaluate machine learning data catalog providers and identifies “the 12 most significant ones — Alation, Cambridge Semantics, Cloudera, Collibra, Hortonworks, IBM, Infogix, Informatica, Oracle, Reltio, Unifi Software, and Waterline Data.”

We’re pleased to not only be part of the list but to be recognized as a Leader in the category.

Alation Named a Leader in Machine Learning Data Catalogs

“Clients considering Alation will find a strong MLDC to help understand and make sense of the vast sources that exist on-premises, in the cloud, and across their legacy and modern systems.

The Alation Data Catalog received the highest possible scores in 15 criteria including stewardship processes, team collaboration, policy management and overall satisfaction with the tool. Not only does the report recognize Alation as a leader in the category but as its innovator as well. According to the report, “Alation started the MLDC trend.“

How Did the MLDC Trend Start?

It was about three years ago that we realized that the drivers of this revolution in data access, the sponsors of these catalogs, wouldn’t be the technologists in the organization. The information revolutionaries would be the analysts, data scientists and information stewards who needed to find a more productive way to sort through the data deluge they faced on a daily basis. It is gratifying to see a market emerge around that vision.

As innovators and leaders in this part of the market, we know that true data catalogs are built on machine learning and are designed for business users – the consumers of data. But as the category gains greater recognition, more companies are building data catalog solutions. Some companies included are re-positioning master data management inventories built for IT. Other vendors are doubling down on data governance solutions – re-positioning glossaries and repositories of governance rules. Even some of our key partners – Cloudera and Hortonworks, are included in the report. All of this attention, however, is a positive sign of a maturing and vibrant market category.

What’s Next: Continued Leadership in this Market

“Category creation goes beyond innovation, in that the new category shares roots with its original product class but delivers such exponentially better benefits, experience, and economics that the new category graduates from its original product class. Another telltale sign of category creation is that it comes with a distinctive business model and profit model.”

How do we intend to continue to lead the market in its next stage? By doubling down on our distinctive business and profit model. That model has been developed by actively listening to our customers.

A heartfelt thank you to the 100+ customer organizations who engage with us on a daily basis to define this market. They share wins, educate us on challenges, provide feedback and define the use cases that drive our definition of the boundaries of the data cataloging category. Michelle Goetz had an opportunity to speak with many of them. And we encourage you to check out their stories as you consider how providing greater accessibility with a data catalog can drive value for your organization.

The Forrester Wave™: Machine Learning Data Catalogs, Q2 2018
A New Market Category
Alation Named a Leader in Machine Learning Data Catalogs
How Did the MLDC Trend Start?
What’s Next: Continued Leadership in this Market