Hundreds of data sources. Hundreds (even thousands) of data consumers. Millions of datasets. Gone are the days when static metadata repositories and manual curation were enough (if they ever really were). To keep up with the rapid influx of data, the many disparate data environments, and the rise in self-service analytics users, enterprises need an enterprise data catalog to drive the business forward with data, and ensure compliant, accurate data use.
In a recent report*, Gartner analysts, Ehtisham Zaidi and Guido De Simoni, call augmented data catalogs an “enterprise must-have” for data and analytics leaders. In the report, they write, “Demand for data catalogs is soaring as organizations continue to struggle with finding, inventorying and analyzing vastly distributed and diverse data assets. Data and analytics leaders must investigate and adopt ML-augmented data catalogs as part of their overall data management solutions strategy.”
But, perhaps, you don’t need to be convinced. You have identified the need within your own organization, you have seen the recommendations from industry experts like Ehtisham and Guido, and have already decided to implement an enterprise data catalog. The next question might be, should you you build an enterprise data catalog in-house or buy one from a market leader. As you attempt to address the universal “build vs buy” dilemma, there are six key considerations to keep in mind:
- What are the opportunity costs to leverage internal resources to build a data catalog?
- Can you wait two to three years for development to finish?
- Is the data catalog just for IT or the business? Do you have the resources to build the UI/UX?
- Does your development team have the bandwidth to support or maintain the data catalog? Are there resources to train and drive adoption of the data catalog?
- Do you have the machine learning expertise to capture technical, operational, business and social metadata metadata?
- Who will own maintenance and support of the data catalog?
Luckily, you are not the first one to tackle these questions. Two of the most data-savvy technology companies in the world—eBay and LinkedIn—undertook the initiative to build an enterprise data catalog. Their journeys provide important lessons for any organization contemplating to build a data catalog.
When eBay started its data catalog initiative, it was processing 100 petabytes of data, generating 50 terabytes of new data, and running more than seven million queries—all in a day! More than 300 data analysts and 5,000 business users were accessing eBay’s analytics platform directly and through more than 10,000 reports in Tableau and 5,000 in MicroStrategy. At the time LinkedIn embarked on its data catalog journey, it had 50 thousand datasets, 15 petabytes of storage (across Teradata, Hadoop, and other data sources), 14 thousand comments, and 35 million job executions.
While both companies took different approaches to cataloging, the outcome was the same—they iterated multiple times on internally built data catalogs before ultimately buying one. To find out more about their data cataloging journeys, download our exclusive white paper, The Enterprise Data Catalog: Build or Buy?
The white paper delves into the features and capabilities that make for a robust data catalog—beyond merely inventorying data and metadata. Enterprise data catalogs, today, provide a unified view of the data, harness machine learning to capture behavioral and usage patterns, crowdsource endorsements, leverage data recommendation engines, and much, much more.
*Augmented Data Catalogs: Now an Enterprise Must-Have for Data and Analytics Leaders by Ehtisham Zaidi and Guido De Simoni, 12 September 2019