Select Page

New connections lower the barrier to trusted data by extending the enterprise data catalog to file systems

REDWOOD CITY, Calif. — September 21, 2017 —Alation Inc., the collaborative data company, today announced deeper support for cataloging data lakes deployed both on-premises and in the cloud, including data lakes built with Amazon Simple Storage Service (S3) and the Hadoop Distributed File System (HDFS). The Alation Data Catalog is the first data catalog to enable data consumers with a complete view into the modern data pipeline. Alation 5.0 delivers governed data within the data lake with increased transparency, empowering self-service analytics users to achieve rapid return on investment through insight discovery.

“For many organizations, data lakes have become a standard, low-cost way to store data in the cloud and on-premises. But, this technological innovation managed to push the cost of organizing data from the data creator to the data consumer. Wading through data lake complexity makes successful self-service analytics nearly impossible,” said Venky Ganti, co-founder and chief technology officer, Alation. “With a data catalog, analysts can drive thoughtful insights from their data lake, immediately.”

According to Gartner, “Despite the variety of vendors, deployment environments and geographic expansion, it is still challenging to get Hadoop-based projects beyond the pilot phase.”*

Alation addresses the complexity of working with data lakes with an enterprise data catalog designed to curate and share context throughout the enterprise. Alation’s new native integration with file systems, including Amazon S3 and HDFS; integration with data processing engines such as Spark, Presto and Impala; and integration with leading data management products, such as the open source project Kylo and Trifacta Enterprise Wrangler, delivers a transparent view into the full context of data stored in the data lake. This visibility into both the path the data traverses within the data lake and the data transformations made throughout the journey consolidates technical and business context into an easily digestible format for more accurate, trusted insights.”

Alation’s ability to augment technical metadata with business context is a great complement to Cloudera Navigator,” said Philippe Marinier, vice president of Business Development, Cloudera. “Together, Cloudera Navigator and the Alation Data Catalog enable a complete range of users to trust data in the data lake and use that knowledge to more quickly make data-driven decisions.”

“Data governance and discovery in the data lake is key to aligning technical and business users of Hadoop. Think Big Analytics open sourced Kylo to help organizations achieve this goal, and has now leveraged Alation’s open APIs to deliver an integrated data management experience across both platforms,” said Mike Merritt-Holmes, VP Strategy, Think Big Analytics, A Teradata Company. “This integration allows users to discover data using Alation directly from Kylo and vice versa, as well as feed full metadata and lineage from Kylo to Alation, where business context is married with technical understanding through curation and collaboration. This enables all enterprise users – from data consumers to information stewards and technical managers – to have a single point of reference in the data lake.”

“Making a business decision from an insight requires understanding the context of how data has been cleaned, structured and joined together,” said Wei Zheng, vice president of products, Trifacta. “By integrating with Alation, joint customers of Trifacta and Alation can understand in detail not only the lineage of their data, but also the detailed wrangling that drove that algorithm or calculation. This increases both the accuracy and speed of applicability of insights to actual business decisions.”

By traversing the storage locations and lineage of data at every point of processing, Alation becomes a single source of reference for data knowledge, no matter where data is physically stored whether in a file or database, on-premises or in the cloud. Alation users always quickly get access to the critical relationships between raw data, transformed data and business context. This unique ability consolidates technical and business data knowledge in one place, and enables a more comprehensive, proactive approach to delivering trusted, governed data insights.

“We have a hybrid data environment at Chegg. Most of the data that analysts access is stored in the cloud in Amazon S3, but it is not unusual for an analyst to require data from both S3 and HDFS to find a trusted insight,” said Matthew Sullivant, Manager of Data Governance, Chegg. “Alation is the solution that our analysts use to find the best, most accurate data for their analysis, no matter where it lies.”

Learn more 

For more information, please visit

*Gartner, Market Guide for Hadoop Distributions, 01 February 2017

Gartner Disclaimer

Gartner does not endorse any vendor, product or service depicted in its research publications, and does not advise technology users to select only those vendors with the highest ratings or other designation. Gartner research publications consist of the opinions of Gartner’s research organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose.

About Paxata

Paxata is the pioneer in empowering all business consumers to intelligently transform raw data into ready information, instantly with an enterprise-grade, self-service, scalable, intelligent platform. Our Adaptive Information Platform weaves data into an information fabric from any source, any cloud, or any enterprise to create trusted information. With Paxata, business consumers use clicks, not code to achieve results in minutes, not months. Companies around the globe rely on Paxata to get smart about information at the speed of thought. Be an Information Inspired Business. Paxata is headquartered in Redwood City, California with offices in New York, Ohio, Washington DC, and Singapore. Visit Twitter, LinkedIn, or YouTube.