OpenLineage-Airflow Integration Configuration (Beta)

Alation Cloud Service Applies to Alation Cloud Service instances of Alation

In this topic:

Prerequisites

  • Airflow with the OpenLineage provider installed and enabled in your environment (self-managed or managed).

  • Ability to define environment variables or provider configuration for:

    • OpenLineage endpoint URL (your tenant-specific Alation URL)

    • OpenLineage namespace (logical grouping for events)

    • Access token if your endpoint requires token authentication

  • Ensure tasks populate inputs and outputs

  • Confirm outbound HTTPS connectivity from Airflow to Alation: ensure your Alation Cloud Service instance is reachable from your Airflow environment over HTTPS.

    • Open outbound HTTPS connectivity (TCP 443) from Airflow workers/schedulers to your Alation domain. If your environment restricts egress, allow your Alation instance host on Airflow. Refer to Alation’s IP Addresses for Allow Lists for information on Alation’s IP addresses.

  • Ensure the data sources you want the lineage for have been cataloged in Alation and that you’ve run metadata extraction at least once.

Preflight Checklist

  • Validate the OpenLineage-related environment variables set during the configuration. These variables can vary depending on your Apache Airflow version.

  • Validate the OpenLineage library installation to check whether the openlineage-python library has been installed and identify its version. This ensures that the necessary library for sending OpenLineage events to Alation is in place.

  • Record Alation’s base URL. You’ll use it to build the OpenLineage endpoint URL as part of the configuration described below, in the format <base_URL>/api/v1/open_lineage.

Step 1: Create an API Access Token

  1. Sign in to Alation as a Server Admin.

  2. Create an API access token using the steps in Create a Refresh Token via the UI.

    Note

    API access tokens are valid for 24 hours by default.

    You can request a custom expiration period (Configure API Tokens Management). Contact Alation Support to change these settings, as they are applied on the server backend.

    This authentication method is for the Beta release and may change in future releases.

  3. Address security considerations for the integration:

    • Store the Alation API token in your platform’s secrets manager (MWAA/AWS Secrets Manager, Cloud Composer secrets).

    • Scope the token to the minimum required permissions and rotate it regularly.

Step 2: Configure Your Airflow Distribution

Self-Managed Airflow

Choose the instructions that match your Airflow version.

  1. For Apache Airflow version 2.7.0 or higher, download and install the latest apache-airflow-providers-openlineage package and update the requirements.txt file of your Apache Airflow instance with apache-airflow-providers-openlineage.

  2. In the project’s .env file, specify the variables listed below. For more information, refer to the Airflow documentation.

    • AIRFLOW__OPENLINEAGE__NAMESPACE: Namespace for your events.

    • AIRFLOW__OPENLINEAGE__TRANSPORT: Specify the details of where and how to send OpenLineage events in the following JSON string format, substituting <base_URL> with your Alation base URL and <API_token> with the API access token generated in Alation:

      {
      "type": "http",
      "url": "https://<base_URL>.alationcloud.com/",
      "endpoint": "open_lineage_event/",
      "auth": {
            "type": "api_key",
            "api_key": "<API_token>"
            }
      }
      
  3. Deploy these settings to all executors/schedulers that run OpenLineage-enabled operators.

  4. If the host and port configured of sources and targets in Airflow do not match the values in the JDBC URI of the corresponding Alation data sources, you must enter the correct host and port under Additional Datasource Connections for that datasource. Find more information in Configure Cross-Source Lineage.

Amazon MWAA

  1. In environment class configuration, add these variables under Airflow configuration options or Environment variables:

    • AIRFLOW__OPENLINEAGE__NAMESPACE: Namespace for your events.

    • AIRFLOW__OPENLINEAGE__TRANSPORT: Specify the details of where and how to send OpenLineage events in the following JSON string format, substituting <base_URL> with your Alation base URL and <API_token> with the API access token generated in Alation:

      {
      "type": "http",
      "url": "https://<base_URL>.alationcloud.com/",
      "endpoint": "open_lineage_event/",
      "auth": {
          "type": "api_key",
          "api_key": "<API_token>"
          }
      }
      

    Refer to the MWAA documentation for more information on setting environment variables.

  2. To set environment variables, you will need to deploy a custom plugin to Amazon MWAA. Create an env_var_plugin.py file and add the following Python code to it, substituting placeholders with real values.

    from airflow.plugins_manager import AirflowPlugin
    import os
    
    os.environ["AIRFLOW__OPENLINEAGE__NAMESPACE"] = "<namespace>"
    os.environ["AIRFLOW__OPENLINEAGE__TRANSPORT"] = '''{
        "type": "http",
        "url": "https://<base_URL>.alationcloud.com/open_lineage_event/",
        "auth": {
            "type": "api_key",
            "api_key": "<API_token>"
        }
    }'''
    os.environ["AIRFLOW__OPENLINEAGE__CONFIG_PATH"] = ""
    os.environ["AIRFLOW__OPENLINEAGE__DISABLED_FOR_OPERATORS"] = ""
    class EnvVarPlugin(AirflowPlugin):
        name = "env_var_plugin"
    

    Values:

    • AIRFLOW__OPENLINEAGE__NAMESPACE: Replace <namespace> with your namespace.

    • AIRFLOW__OPENLINEAGE__TRANSPORT: Specify details of where and how to send OpenLineage events:

      • Replace <base_URL> with the base URL of your Alation instance.

      • Replace <API_token> with the API token generated in Alation.

    • AIRFLOW__OPENLINEAGE__CONFIG_PATH: Specifies that the apache-airflow-providers-openlineage package reads the OpenLineage config from environment variables instead of a config file.

    • AIRFLOW__OPENLINEAGE__DISABLED_FOR_OPERATORS: Specifies that OpenLineage must send events for all operators. Only required for the apache-airflow-providers-openlineage package.

  3. Amazon MWAA allows you to install a plugin through a Zip archive. You can choose one of the following:

    • Use the following code to Zip your env_var_plugin.py file:

      zip plugins.zip env_var_plugin.py
      
    • If you already have a plugins.zip file in your environment, add the env_var_plugin.py file to your Zip archive.

  4. Upload the plugins.zip and requirements.txt files to the S3 bucket connected to your Amazon MWAA environment. Amazon MWAA requires your DAGs, plugins, and the requirements.txt file to be in the same S3 bucket, which serves as the source location for your environment.

  5. You need to specify the path for the latest versions of the plugins.zip and requirements.txt files in Amazon MWAA. To specify the path:

    1. Open the Environments page in the Amazon MWAA console.

    2. Select an environment and then click Edit.

    3. In the DAG code in the Amazon S3 section, configure the following:

      • For Plugins file - optional, select the plugins.zip file in the S3 bucket connected to your Amazon MWAA environment or choose the latest plugins.zip version from the dropdown list.

      • For Requirements file - optional, select the latest requirements.txt file version from the dropdown list.

    4. Click Next, Update environment or Next to save your configurations.

  6. Redeploy the environment.

  7. If the host and port configured of sources and targets in Airflow do not match the values in the JDBC URI of the corresponding Alation data sources, you must enter the correct host and port under Additional Datasource Connections for that datasource. Find more information in Configure Cross-Source Lineage.

Google Cloud Composer

  1. Set the variables in your Environment configuration (Composer) so that they are visible to the Airflow runtime pods. Refer to the Cloud Composer documentation for more information on setting environment variables.

  2. You will need to configure Google Cloud Composer for the integration: open your Google Cloud console and navigate to the Environments page.

  3. From the list of environments, click the name of your environment. Configure the following:

    Set override Airflow configuration options:

    1. On the environment details page, click the Airflow configuration overrides tab and then click Edit.

    2. In the Airflow configuration overrides form, click the Add Airflow configuration override button to specify the first set of values.

    3. For Section 1, enter openlineage.

    4. For Key 1, enter namespace.

    5. Click the Add Airflow configuration override button to specify the second set of values.

    6. For Section 2, enter openlineage.

    7. For Key 2, enter transport.

    8. For Value 2, enter the following JSON object, replacing <base_URL> with your Alation base URL and <API_key> with the API token generated in Alation.

      {
      "type": "http",
      "url": "https://<base_URL>.alationcloud.com/open_lineage_event/",
      "auth": {
          "type": "api_key",
          "api_key": "<API_KEY>"
          }
      }
      
  4. You will also need to install the OpenLineage PyPI package in Google Cloud Composer. To install the OpenLineage PyPI package in your environment:

    1. On the environment details page, click the PyPI packages tab and then click Edit.

    2. Click Add package to add a custom package.

    3. Under PyPI packages, for Package name, specify the package name.

      • For Apache Airflow version 2.7.0 or higher:

        apache-airflow-providers-openlineage
        
      • For Apache Airflow versions 2.5.0 - 2.7.0, excluding verion 2.7.0:

        openlineage-airflow
        
    4. Click Save to save your configuration.

  5. Restart workers/scheduler.

  6. Optionally, trigger the preflight check DAG that checks for connectivity with Alation and for the installation of correct providers & their versions. The DAG is available here <community post link>.

  7. Confirm that all the preflight checks pass from the task logs.

  8. If the host and port configured of sources and targets in Airflow do not match the values in the JDBC URI of the corresponding Alation data sources, you must enter the correct host and port under Additional Datasource Connections for that datasource. Find more information in Configure Cross-Source Lineage.

Limitation

  • The Airflow icon on Lineage graphs will appear only in New User Experience.

Change Management

Before rolling out to production, test in a lower environment with the same namespace conventions, validate lineage renders as expected, and then apply the configuration to production deployments.