How to Enrich Databricks Unity Catalog with Alation Metadata

By Brett Larson

Published on November 19, 2025

How to Enrich Databricks Unity Catalog with Alation Metadata

Today’s data teams live across multiple platforms. Business context lives in Alation—where organizations capture trusted definitions, document data assets, and curate metadata that fuels accurate analysis and responsible AI.As Databricks becomes a primary workspace for data engineers, analysts, and AI builders, extending that business context into this environment ensures a consistent understanding wherever work happens.

Too often, these two worlds operate in silos. When rich business metadata doesn’t travel with the data itself, users are left guessing what a field really means, how it’s sourced, or whether it’s safe to use. The result? Slower collaboration, inconsistent analysis, and AI outputs that can’t be trusted.

This blueprint bridges that gap. It outlines how to synchronize metadata from Alation into Databricks Unity Catalog using a few simple Python snippets. It provides a persistent way to surface schema, table, and column descriptions in Databricks, so business meaning flows alongside technical data. While Alation’s Agent Studio already allows Databricks agents to fetch metadata on demand, the blueprint outlined below offers a complementary option for keeping that context continuously available within the Databricks platform itself.

Why this approach?

Rather than relying on a rigid, prepackaged connector, this workflow gives you flexibility to align metadata syncs with your unique governance model and AI priorities.

  • Customize mappings: Tailor how Alation’s rich business context maps to Databricks’ catalog objects to reflect your organization’s taxonomy.

  • Control cadence and scope: Schedule and prioritize updates based on your operational needs.

  • Stay resilient: Avoid brittle dependencies—if Databricks or Alation models evolve, your sync logic can easily adapt.

How it works

  • Flows schema, table, and column descriptions from Alation to Databricks Unity Catalog

  • Automatically strips HTML formatting

  • Keeps Alation as the authoritative source of business metadata

Step-by-step implementation

1. Create input widgets

These widgets let you pass parameters for your Alation data source ID and the Alation Analytics catalog.

They allow you to define:

  • The catalog name associated with the federated Alation Analytics Snowflake database

  • The Alation Data Source ID containing the curated metadata

Python

dbutils.widgets.removeAll()

dbutils.widgets.text('alation_dsid','','Alation Datasource ID')
dbutils.widgets.text('alation_analytics_catalog','','Alation Analytics Catalog')

2. Define HTML cleanup function

Descriptions in Alation often contain HTML tags for formatting. This simple helper function removes those tags, making the text clean before posting to Databricks:

Python

  def remove_html_tags(text):
    """Remove html tags from a string"""
    import re
    clean = re.compile('<.*?>')
    return re.sub(clean, '', text)

3. Get a list of schemas with descriptions

This query retrieves all schemas in Alation that contain descriptions. You can adjust the filters to limit which objects are extracted:

Python

# Default query will get ALL Schemas that have a description
# Add Filters to query to limit objects to extract descriptions from

alation_analytics_schema_sql = "\\
    select s.name as schema_name, s.description \\
    from {0}.public.rdbms_datasources ds \\
        JOIN {0}.public.rdbms_schemas s ON ds.DS_ID = s.DS_ID\\
    WHERE ds.DS_ID = {1}\\
        and s.DESCRIPTION <> ''\\
        and s.DELETED = false\\
".format(dbutils.widgets.get('alation_analytics_catalog'), dbutils.widgets.get('alation_dsid'))

schema_df = sql(alation_analytics_schema_sql)

display(schema_df)

4. Generate & execute comment SQL for schemas

This section writes schema-level descriptions from Alation into Databricks:

Python

sqlCmdList = []
rows = schema_df.collect()

for row in rows:
    schemaName = row["schema_name"]
    description = row["description"]
    sqlTxt = "COMMENT ON SCHEMA {0} IS \\"{1}\\"".format(schemaName, remove_html_tags(description))
    sqlCmdList.append(sqlTxt)
    print(schemaName)

#print(sqlCmdList)

for sqlTxt in sqlCmdList:
    sql(sqlTxt)
    ```

5. Get a list of tables with descriptions

Similar to schemas, this query extracts table-level descriptions from Alation:

Python

# Default query will get ALL tables that have a description
# Add Filters to query to limit objects to extract descriptions from

alation_analytics_table_sql = "\\
    select s.name || '.' || t.name as full_table_name, t.description \\
    from {0}.public.rdbms_datasources ds \\
        JOIN {0}.public.rdbms_schemas s ON ds.DS_ID = s.DS_ID\\
        JOIN {0}.public.rdbms_tables t ON s.SCHEMA_ID = t.SCHEMA_ID\\
    WHERE ds.DS_ID = {1}\\
        and t.DESCRIPTION <> ''\\
        and t.DELETED = false\\
        and s.DELETED = false\\
".format(dbutils.widgets.get('alation_analytics_catalog'), dbutils.widgets.get('alation_dsid'))

tables_df = sql(alation_analytics_table_sql)

display(tables_df)

6. Generate & execute comment SQL for tables

This section writes the cleaned table descriptions into Databricks:

Python

sqlCmdList = []
rows = tables_df.collect()

for row in rows:
    fullTableName = row["full_table_name"]
    description = row["description"]
    sqlTxt = "COMMENT ON TABLE {0} IS \\"{1}\\"".format(fullTableName, remove_html_tags(description))
    sqlCmdList.append(sqlTxt)
    print(fullTableName)

#print(sqlCmdList)

for sqlTxt in sqlCmdList:
    sql(sqlTxt)

7. Get a list of columns with descriptions

Finally, this query retrieves column-level descriptions. You can further filter it if needed:

Python

# Default query will get ALL table columns that have a description
# Add Filters to query to limit objects to extract descriptions from

alation_analytics_column_sql = "\\
    select s.name || '.' || t.name as full_table_name, c.name, c.description \\
    from {0}.public.rdbms_datasources ds \\
        JOIN {0}.public.rdbms_schemas s ON ds.DS_ID = s.DS_ID\\
        JOIN {0}.public.rdbms_tables t ON s.SCHEMA_ID = t.SCHEMA_ID\\
        JOIN {0}.public.rdbms_columns c ON t.TABLE_ID = c.TABLE_ID\\
    WHERE ds.DS_ID = {1}\\
        and c.DESCRIPTION <> ''\\
        and s.DELETED = false\\
        and t.DELETED = false\\
        and c.DELETED = false\\
".format(dbutils.widgets.get('alation_analytics_catalog'), dbutils.widgets.get('alation_dsid'))

columns_df = sql(alation_analytics_column_sql)

display(columns_df)

8. Generate & execute comment SQL for columns

This step writes column-level descriptions into Databricks:

Python

sqlCmdList = []
rows = columns_df.collect()

for row in rows:
    fullTableName = row["full_table_name"]
    columnName = row["name"]
    description = row["description"]
    sqlTxt = "COMMENT ON COLUMN {0}.{1} IS \\"{2}\\"".format(fullTableName, columnName, remove_html_tags(description))
    sqlCmdList.append(sqlTxt)
    print("{0}.{1}".format(fullTableName, columnName))

#print(sqlCmdList)

for sqlTxt in sqlCmdList:
    sql(sqlTxt)

Summary & additional information

This workflow allows teams to curate once in Alation and propagate meaningful descriptions to Databricks continuously. This process flow ensures users across systems have the same information available.

Governance & safety tips:

Teams should review each step carefully and adapt it to meet their requirements.

  • Always test in a non-production workspace first.

  • Review descriptions before propagation (what’s public in Alation will appear in Databricks). 

  • Ensure you have the right permissions in Databricks and that Databricks audit logging is enabled to track any COMMENT ON write operations.

Start small and test it in your environment, see the impact, and imagine what’s possible when your catalog and your workspace finally speak the same language.

By letting Alation serve as your source of truth and Databricks as your system of action, your teams can collaborate in real time with shared context, consistent language, and complete trust in the data.

Curious to learn more?

    Contents
  • Why this approach?
  • Step-by-step implementation
  • Summary & additional information
Tagged with

Loading...