Apache Airflow Integration (Beta)

Alation Cloud Service Applies to Alation Cloud Service instances of Alation

Important

This guide uses Apache Airflow as a validated integration example. OpenLineage works with a variety of systems, not just Airflow. If you’re using a different orchestration platform or system, refer to OpenLineage Integration for general prerequisites or to Direct API Integration for OpenLineage Events for instructions on integrating directly through the API.

Overview

This guide explains how to configure Apache Airflow to send OpenLineage events to Alation.

The integration lets Alation consume OpenLineage events from your Airflow environment and display cross-source lineage in the catalog, allowing users to trace data movement across pipelines and downstream systems.

Key Behaviors

  • Lineage is formed only from successful OpenLineage events. Failed or incomplete events don’t create lineage links.

  • Each event must include both input datasets (sources) and output datasets (targets); Alation relies on this to “stitch” lineage together.

  • Resulting lineage appears on the Lineage tab of relevant objects in the catalog and can participate in Impact Analysis. The Lineage diagram displays additional details:

    • Airflow indicators show jobs originating from Airflow

    • Dataflow details include metadata from the Airflow DAG

    • The job name

    • Namespace

    • Event type

    • Event completion time

    ../../../_images/OpenLineage_Airflow_ExampleChart.png

Supported Airflow Environments

Supported Airflow Versions

Any Airflow distribution that can install and run the official OpenLineage provider. Consult your distribution’s compatibility matrix.

Supported Operators

Alation supports all Airflow operators that are compatible with the Airflow OpenLineage provider.

  • Supported operators: See the Supported operators for the list of operators supported by Alation via the OpenLineage provider.

  • Validated operators: Alation has specifically validated lineage resolution with the following commonly-used operators:

    • SnowflakeOperator

    • PostgresOperator

    • Redshift operators:

      • PostgresOperator

      • SQLExecuteQueryOperator

    • MySqlOperator

    • CopyFromExternalStageToSnowflakeOperator (S3 to Snowflake)

    • Google BigQuery operators:

      • BigQueryToBigQueryOperator

      • BigQueryInsertJobOperator

Operators not listed in the validated set remain compatible with Alation, provided they are supported by the Apache Airflow OpenLineage provider.

Integration Workflow

  1. Your Airflow deployment emits OpenLineage events during task execution via the OpenLineage provider.

  2. Events are sent over HTTPS to your Alation ingestion endpoint and include:

    • job run context

    • namespace

    • inputs

    • outputs

  3. Alation processes events and builds lineage links between sources and targets it discovered.