Alation and Manta: Automating Advanced Data Lineage
By Peter Wang
Published on August 20, 2020
As data grows in complexity, data lineage becomes critical to understanding, securing, and properly using data to drive effective business decisions. There are three long standing challenges with data lineage. First, traditional approaches to data lineage require a significant amount of manual effort to document and maintain. Second, incomplete lineage means data engineers are unable to identify who and what data is impacted by their changes. Third, knowledge about data lineage is usually limited to a handful of data engineers, but understanding data lineage is just as important (if not more) for analysts, data scientists, and business users.
Today, Alation and Manta announced a partnership that addresses those challenges by automating lineage extraction, integrating Manta’s enhanced lineage into the Alation data catalog, and maintaining an effective, agile enterprise data environment.
Addressing the Top-Three Data Lineage Use Cases with Alation and Manta
In order to map lineage at an enterprise scale, it must be automated. In Alation, all data assets along with the core lineage within each data source is automatically captured. Manta brings additional, deep, cross-source lineage into the data catalog, effectively mapping out the entire end-to-end lineage automatically. Not only does lineage get mapped across different sources, but the history of those lineages are also captured.
In order for data lineage to be impactful, it must be available to everyone making decisions with data. Through the integration of Alation and Manta, business users, data analysts, data engineers, and data scientists can perform detailed impact analysis across all data sources to better understand the scope and potential impact of changes. Alation’s data quality propagation, combined with Manta’s enhanced lineage mapping at the column-level, alerts users to upstream data issues across data sources in real-time. And all of that rich information is available within Alation for the entire data lifecycle.
Compliance and Data Security
Data lineage helps businesses secure their most valuable data assets, providing the ability to rapidly identify all upstream and downstream impacts of a particular data asset. This enables enterprises to more effectively protect against data breaches and address the requirements of new data regulations like GDPR and CCPA.
The Alation Data Catalog enables data stewards to automatically identify data assets that are valuable or sensitive. When this information is combined with Manta’s cross-source lineage mapping, data stewards can identify the enterprise’s most critical data, where it originated, and its transformation. With this information, data stewards can establish solutions, maintain data availability, meet the governance requirements, and enhance security.
Impact analysis for change management
Proper impact analysis is critical to maintaining an agile enterprise data environment. With a wide array of databases, data warehouses, data processes, as well as BI tools, the modern data lifecycle relies on the seamless flow of data across a variety of sources and stakeholders. Any changes, whether a simple change of column type or more complex change to data migrations, can negatively affect business-critical functions if the original impact analysis was incomplete or not communicated effectively.
By cataloging different data sources and using Manta for detailed lineage mapping across those different sources in Alation, data stewards can easily identify the impact of any particular change. For example, if a data table is going to be deprecated, Alation can propagate that deprecation flag to all impacted data assets and automatically alert all the key stakeholders. Alternatively, when a data engineer considers changing a column type, they can go to Alation’s data catalog and immediately see what that change will impact, and which stakeholders need to be notified. This type of impact analysis is especially useful during cloud migration where it’s critical to identify what data assets should or should not be migrated and contact the relevant data users to inform them of the pending migration. With this comprehensive visibility of data lineage within Alation, enterprises have one interface for streamlining the change management process.
Analytical accuracy and productivity
Self-service analytics has long been the vehicle for data democratization. However, two of the biggest challenges to self-service analytics include finding data and ensuring data quality.
Alation is a powerful platform for data search & discovery, providing one place to search or query all data assets across the enterprise. Combining this powerful data search & discovery with rich cross-source lineage from Manta enables analysts to instantly discover relevant data and the context on its origination, usage, and transformation. For example, a data analyst finds an existing BI dashboard, but they need more in-depth information on how and why the dashboard was created to decide how to leverage it. Instead of spending valuable time trying to track down whoever created the summary view and asking them for more details about the source data, the analyst can get a holistic view of all the data assets in the upstream lineage instantly within Alation.
Aside from finding the data, analysts also need to make sure the data is accurate. By viewing the lineage mapping for a data asset, analysts can decipher the trustworthiness of the data asset by identifying how the asset was created. Further, Alation automatically propagates warnings and deprecations, ensuring that one inaccurate or out-of-date asset doesn’t become the rotten apple that spoils even more assets and causes even more issues
Data Lineage for Everyone
By capturing lineage in Alation, enterprises can make data lineage accessible to a much broader set of users, including analysts, business users, and data scientists. Combined with Manta, that core lineage information is enriched with deep, cross-source lineage, effectively mapping out end-to-end data lineage automatically. As a result, data lineage becomes something that everyone who works with data can use to improve their analysis, empowering more people to move the business forward with data-driven decision-making.
- Addressing the Top-Three Data Lineage Use Cases with Alation and Manta
- Compliance and Data Security
- Impact analysis for change management
- Analytical accuracy and productivity
- Data Lineage for Everyone