Alation 2023.1: Easing Self-Service for the Modern Data Stack with Databricks and dbt Labs
By Daniel Petzold
Published on April 4, 2023
With the launch of our latest product release, 2023.1, Alation delivers extended connectivity for Databricks Unity Catalog, the lakehouse company, and new connectivity for dbt Cloud by dbt Labs, the pioneer in analytics engineering. Now, joint users will get an enhanced view into cloud and data transformations, with valuable context to guide smarter usage.
At the heart of this release is the need to empower people with the right information at the right time. Integrating helpful metadata into user workflows gives all people, from data scientists to analysts, the context they need to use data more effectively.
The Benefits and Challenges of the Modern Data Stack
Why are such integrations needed? The modern data stack (MDS) has some incredible benefits, namely robust, fast, and scalable services in the cloud. However, the race to the cloud has also created challenges for data users everywhere, including:
Cloud migration is expensive, migrating sensitive data is risky, and navigating between on-prem sources is often confusing for users.
Building data pipelines is challenging, and complex requirements (as well as the separation of many sources) leads to a lack of trust.
In short, we are in the middle of a cloud complexity crisis, with a mess of data, tools, and platforms spreading without context, connection, or relation to each other. The people navigating these increasingly chaotic landscapes need a single place to find, understand, and use data with total confidence.
Expanded Integration with Databricks Unity Catalog
Unity Catalog is Databricks’ governance and admin layer for all lakehouse data and AI assets, including files, tables, ML models, and dashboards.
Now in Alation 2023.1, with the expanded connector to Databricks Unity Catalog, joint customers can sample and profile lakehouse data in Alation, as well as compose queries on lakehouse data from Alation.
Sampling previews a small slice of data, its structure, and how it can be used. Users can sample lakehouse data to learn popular uses.
Profiling delivers a birds-eye view of the statistics of the data, such as minimum, maximum, median, and null values. This empowers users to judge data’s quality and fitness for purpose quickly.
Compose, Alation’s intelligent SQL editor, enables users to browse data sources as they query, with real-time warnings for untrusted or ungoverned data; Compose lowers the barrier to entry for nontechnical users with features like autocomplete, which generates suggestions as users write queries. Now, users can operationalize this powerful tool across their entire lakehouse.
Before a data user leverages any data set, they need to be able to learn about it. How was it used in the past? What was its original purpose? Who knows it best? For data scientists and engineers, answering these questions enables them to build predictive models with improved accuracy. With this integration, such teams can quickly search across many sources to find the answers. Connecting Unity Catalog to Alation empowers data leaders in organizations to:
Scale data access with self-service and visibility to ensure lakehouse adoption
Discover and migrate high-value data faster, optimizing costs
Give data engineers and data scientists real-time context into the systems they use for both the MDS and on-prem – accelerating productivity
Govern and catalog metadata for a Databricks account across multiple workspaces (instead of a single workspace)
The organizations using Databricks today compute huge volumes of data across hybrid systems. Making sense of that data is enormously challenging because these resources are often divorced from their context and full history. But when that context is centrally available, people can more quickly find what they need. They can also avoid building redundant pipelines. This expanded connector to Databricks Unity Catalog does just that, delivering to joint customers a comprehensive view of all cloud data.
New Connectivity for dbt
Modern data engineers confront complex, challenging data environments and need to empower data users for self-service. To build effective data pipelines, they need context (or metadata) on every source. Context empowers them to quickly find and understand data within Alation – understand potential breaking changes to upstream resources, and communicate transformations more effectively.
With Alation 2023.1, our new connector for dbt Core and dbt Cloud delivers that context. This connector extracts dbt table and column level descriptions to flesh out curation on data transformations, directly within Alation. This helps users of the data catalog make better decisions on their use of data, based on information provided by dataOps on potential changes and updates to that data within dbt.
A critical part of this release is expanded lineage for Snowflake, Redshift, and PostgreSQL, showing dbt transformation code (SQL & Jinja) in the Data Flow Object within Alation’s lineage graph. This provides a view for end-to-end lineage in Alation, for context from source to destination. This not only helps data engineers and analysts better understand data for building pipelines – it empowers business users to grasp changes made to underlying sources downstream. For example, with a currency or time conversion. In addition, support for Google Big Query and Databricks within the dbt connector is soon to be released.
Now, users can quickly survey and see all pertinent metadata on dbt pipelines in Alation. Joint users can search, curate, tag, and flag lineage easily, accelerating productivity and accuracy. In this way, a business user can understand if anomalies they may be seeing in the data are related to data issues or actual business problems, and take action early on.
Alation’s approach to the Modern Data Stack
This release enhancing data connectivity aligns with our previous release of Alation Anywhere for the MDS, which enables people using Tableau to understand and have confidence in the data they are exploring there. Such connectivity delivers the context people need to ask (and answer) vital questions like, “do I report an issue to data engineering?” or “is this a legitimate business issue that I need to make my team aware of?”
Now with this new 2023.1 release, Alation is expanding connectivity to cloud and transformation metadata in two of the most popular tools in the MDS. Databricks and dbt sources help BI analysts by giving them useful insights and context from Alation, without their having to leave Tableau or disrupt their flow of work. This update empowers users to move forward even faster, and perhaps more importantly, to slow down when necessary to ask questions and avoid costly mistakes.
Innovating in the Modern Data Stack
This announcement is just the latest step in our journey to connect to every source in the modern data stack. As the cloud becomes more important, and the modern data stack more complex, working with our partners becomes critical.
Our commitment is to empower people to use data more often, and visibility, data discovery, and governance across all systems, sources, and end users are critical to this end. But providing an open API isn’t enough. Our Open Connector Framework (OCF) focuses on providing the best performance possible while providing an open SDK and guardrails for exponential third-party development.
We’re thrilled to expand connectivity to best-of-breed partners like Databricks and dbt, enabling us to deliver superior experiences to modern data users today.
Stay tuned for more exciting updates soon!
- The Benefits and Challenges of the Modern Data Stack
- Expanded Integration with Databricks Unity Catalog
- New Connectivity for dbt
- Alation’s approach to the Modern Data Stack
- Innovating in the Modern Data Stack