Shopping for Data

By Stephanie McReynolds

Published on February 20, 2020

As big data matures, the way you think about it may have to shift also. It’s no longer enough to build the data warehouse. Dave Wells, analyst with the Eckerson Group suggests that realizing the promise of the data warehouse requires a paradigm shift in the way we think about data along with a change in how we access and use it. Self-service analytics environments are giving rise to the data marketplace. Unlike the data warehouses of the past, an enterprise data marketplace (EDM) doesn’t just store, process, or manage information by IT.Instead it gives anyone in your company who works with data, the ability to easily access, search, discover, share opinions, and collaborate on that data.

In many ways this approach, while new to the enterprise, is similar to that seen for the last few years in consumer software like Amazon, Etsy, eBay, and Uber (especially the reviews). In fact, Wells has identified four characteristics of digital marketplaces that should be present in any EDW.

  • Categorization organizes the marketplace to simplify browsing (either by data asset type or topic)

  • Curation allows active management of the data sets that are available in the EDM, selects and and qualifies data sets, describes each one and manages all metadata

  • Cataloging exposes data sets for data shoppers, including descriptions and metadata and provides a view into the inventory of curated data sets

  • Crowdsourcing is the equivalent of a social network for data which allows data shoppers to actively participate in cataloging, curating, and categorizing data. Through this ongoing feedback loop, the quality of the data in the marketplace undergoes continuous improvement

A data marketplace solves many of the problems that a data warehouse only begins to address: it provides visibility into data sets no matter where they are physically stored, it replaces tribal knowledge and word-of-mouth with verified information, and it shortens the cycle from analysis to insight.

Data As A Service

Data warehouse engineering alone can’t keep pace with the proliferation of new data sources continually coming online. Focus has shifted from data we are creating ourselves to data that is coming from different sources (internal, open, and commercial).

Wells proposes that we start to think about data as a service that is made accessible to data consumers through a “data storefront”. These storefronts inventory and manage data assets and users can shop to find best-fit assets that meet their needs. Some solutions actually go as far as providing a shopping cart for the data (just as you’d find on any commerce site). “Imagine the power and impact of a curated marketplace that serves a community of analysts “shopping for data” he says. The enterprise data marketplace is (as the name suggests) purely a marketplace for enterprise data. But one that takes inspiration from consumer software. Imagine if finding and viewing a data set of interest was as easy as finding and listening to music with Pandora?

In the future, the recommendation systems within the EDM will increasingly resemble those in consumer software like Amazon or Spotify — making pro-active, context-sensitive recommendations on which data sets or queries to use. Social interactions such as “following” other shoppers (as on Pinterest or Instagram) may also be built into the EDM. Even something like gamification may emerge as a way to fully engage data shoppers as a community.

Behind the scenes, ‘backroom services” will power the storefront, performing such tasks as data acquisition, data preparation, data curation and cataloging, and tracking.

Building the EDM

As a cornerstone of your data architecture the EDM is a serious undertaking whether it is enabled by building on existing technologies or by deploying a single tool that includes all of the functions needed to successfully implement one. That could mean combining a data catalog such as Alation with a Hadoop Data Lake or it could mean finding a single tool that does everything. Here’s how The Eckerson Group breaks it down:

Data Lake Management diagram

  • Data As A Service
  • Building the EDM
Tagged with