Deriving Insight From AI

Deriving behavior based insight from context is critical to understanding your data

What is driving the adoption of artificial intelligence (AI) today?

The amount of data that companies must deal with on a daily basis is outpacing their ability to keep up with it. These days, it seems like almost every application used to run your company is generating data. Everything from Internet of Things (IoT) sensors, social and mobile computing, to increased use of customer relationship management tools (CRM) means more data that you need to find, understand, and learn to trust. The use of big data also cuts across industries; pharmaceutical and healthcare, science and academia, e-commerce, and manufacturing are just a few of the industries struggling to keep up with big data.

Why is AI So Popular Today?

More computational power, the digitization of society, and more sophisticated algorithms are driving the adoption of AI. It seems a day doesn’t go by without hearing of another intelligent bot and “intelligent assistants” such as Siri, Alexa, Google Assistant, and Cortana (among several others) have found their way from the research lab into the everyday lives of consumers. We’ve also seen more investment in AI research and development, not just in academia, but in software companies (from startups to large established enterprises) and elsewhere. As a result, finding the right answer at the right time has become much easier.

Smart data catalogs

A data catalog is an application that ensures that users have accurate insight into data accuracy, quality, and usefulness. Not only is technical metadata captured, but a data catalog can surface the behavioral context of your data by automatically crawling your data sources and performing a deep parse of the query logs. By parsing the query logs used to access data during a user’s normal flow of work, meaningful patterns emerge and insights are surfaced through:

  • Usage of data (top users, popular columns, tables, and schemas)
  • Data profiles (insight into numeric distribution of data)
  • Interactive data lineage (a visual map of where the data came from, where it’s going, and how it’s being moved)

A “smart” data catalog can indicate the popularity, recency, or provenance (lineage) of a particular data asset. All of this can help the data scientist by doing the heavy lifting required, leading to a reduction in the time it takes to derive an insight. With a data catalog in place, a data scientist can easily, discover, trust, and use the available data. Additionally,  a data scientist can spend less time finding and preparing the data and more time running machine learning algorithms.

AI and data catalogs

Out of all the AI techniques that could be used, machine learning is the most useful for making data catalogs smarter. A data catalog can use machine learning and natural language processing algorithms to learn the lexicon of your business, standardizing references through automated and consistent recommendations of user friendly data labels.

Feeding the algorithm

The machine learning algorithm is fed by Wiki pages, spreadsheets, or data modeling documentation. A smart data catalog can be trained to apply business labels to technical metadata. This is just one example of how AI can power a smart data catalog.

A smart data catalog can learn from the way data is being used, make pro-active suggestions as to which data to trust, and even to take a specific action such as deprecating a table, based on this information. Deriving behavior based insight from context is critical to understanding your data.