By Brett Larson
Published on November 19, 2025

Today’s data teams live across multiple platforms. Business context lives in Alation—where organizations capture trusted definitions, document data assets, and curate metadata that fuels accurate analysis and responsible AI.As Databricks becomes a primary workspace for data engineers, analysts, and AI builders, extending that business context into this environment ensures a consistent understanding wherever work happens.
Too often, these two worlds operate in silos. When rich business metadata doesn’t travel with the data itself, users are left guessing what a field really means, how it’s sourced, or whether it’s safe to use. The result? Slower collaboration, inconsistent analysis, and AI outputs that can’t be trusted.
This blueprint bridges that gap. It outlines how to synchronize metadata from Alation into Databricks Unity Catalog using a few simple Python snippets. It provides a persistent way to surface schema, table, and column descriptions in Databricks, so business meaning flows alongside technical data. While Alation’s Agent Studio already allows Databricks agents to fetch metadata on demand, the blueprint outlined below offers a complementary option for keeping that context continuously available within the Databricks platform itself.
Rather than relying on a rigid, prepackaged connector, this workflow gives you flexibility to align metadata syncs with your unique governance model and AI priorities.
Customize mappings: Tailor how Alation’s rich business context maps to Databricks’ catalog objects to reflect your organization’s taxonomy.
Control cadence and scope: Schedule and prioritize updates based on your operational needs.
Stay resilient: Avoid brittle dependencies—if Databricks or Alation models evolve, your sync logic can easily adapt.
Flows schema, table, and column descriptions from Alation to Databricks Unity Catalog
Automatically strips HTML formatting
Keeps Alation as the authoritative source of business metadata
These widgets let you pass parameters for your Alation data source ID and the Alation Analytics catalog.
They allow you to define:
The catalog name associated with the federated Alation Analytics Snowflake database
The Alation Data Source ID containing the curated metadata
Python
dbutils.widgets.removeAll()
dbutils.widgets.text('alation_dsid','','Alation Datasource ID')
dbutils.widgets.text('alation_analytics_catalog','','Alation Analytics Catalog')
Descriptions in Alation often contain HTML tags for formatting. This simple helper function removes those tags, making the text clean before posting to Databricks:
Python
def remove_html_tags(text):
"""Remove html tags from a string"""
import re
clean = re.compile('<.*?>')
return re.sub(clean, '', text)
This query retrieves all schemas in Alation that contain descriptions. You can adjust the filters to limit which objects are extracted:
Python
# Default query will get ALL Schemas that have a description
# Add Filters to query to limit objects to extract descriptions from
alation_analytics_schema_sql = "\\
select s.name as schema_name, s.description \\
from {0}.public.rdbms_datasources ds \\
JOIN {0}.public.rdbms_schemas s ON ds.DS_ID = s.DS_ID\\
WHERE ds.DS_ID = {1}\\
and s.DESCRIPTION <> ''\\
and s.DELETED = false\\
".format(dbutils.widgets.get('alation_analytics_catalog'), dbutils.widgets.get('alation_dsid'))
schema_df = sql(alation_analytics_schema_sql)
display(schema_df)
This section writes schema-level descriptions from Alation into Databricks:
Python
sqlCmdList = []
rows = schema_df.collect()
for row in rows:
schemaName = row["schema_name"]
description = row["description"]
sqlTxt = "COMMENT ON SCHEMA {0} IS \\"{1}\\"".format(schemaName, remove_html_tags(description))
sqlCmdList.append(sqlTxt)
print(schemaName)
#print(sqlCmdList)
for sqlTxt in sqlCmdList:
sql(sqlTxt)
```
Similar to schemas, this query extracts table-level descriptions from Alation:
Python
# Default query will get ALL tables that have a description
# Add Filters to query to limit objects to extract descriptions from
alation_analytics_table_sql = "\\
select s.name || '.' || t.name as full_table_name, t.description \\
from {0}.public.rdbms_datasources ds \\
JOIN {0}.public.rdbms_schemas s ON ds.DS_ID = s.DS_ID\\
JOIN {0}.public.rdbms_tables t ON s.SCHEMA_ID = t.SCHEMA_ID\\
WHERE ds.DS_ID = {1}\\
and t.DESCRIPTION <> ''\\
and t.DELETED = false\\
and s.DELETED = false\\
".format(dbutils.widgets.get('alation_analytics_catalog'), dbutils.widgets.get('alation_dsid'))
tables_df = sql(alation_analytics_table_sql)
display(tables_df)
This section writes the cleaned table descriptions into Databricks:
Python
sqlCmdList = []
rows = tables_df.collect()
for row in rows:
fullTableName = row["full_table_name"]
description = row["description"]
sqlTxt = "COMMENT ON TABLE {0} IS \\"{1}\\"".format(fullTableName, remove_html_tags(description))
sqlCmdList.append(sqlTxt)
print(fullTableName)
#print(sqlCmdList)
for sqlTxt in sqlCmdList:
sql(sqlTxt)
Finally, this query retrieves column-level descriptions. You can further filter it if needed:
Python
# Default query will get ALL table columns that have a description
# Add Filters to query to limit objects to extract descriptions from
alation_analytics_column_sql = "\\
select s.name || '.' || t.name as full_table_name, c.name, c.description \\
from {0}.public.rdbms_datasources ds \\
JOIN {0}.public.rdbms_schemas s ON ds.DS_ID = s.DS_ID\\
JOIN {0}.public.rdbms_tables t ON s.SCHEMA_ID = t.SCHEMA_ID\\
JOIN {0}.public.rdbms_columns c ON t.TABLE_ID = c.TABLE_ID\\
WHERE ds.DS_ID = {1}\\
and c.DESCRIPTION <> ''\\
and s.DELETED = false\\
and t.DELETED = false\\
and c.DELETED = false\\
".format(dbutils.widgets.get('alation_analytics_catalog'), dbutils.widgets.get('alation_dsid'))
columns_df = sql(alation_analytics_column_sql)
display(columns_df)
This step writes column-level descriptions into Databricks:
Python
sqlCmdList = []
rows = columns_df.collect()
for row in rows:
fullTableName = row["full_table_name"]
columnName = row["name"]
description = row["description"]
sqlTxt = "COMMENT ON COLUMN {0}.{1} IS \\"{2}\\"".format(fullTableName, columnName, remove_html_tags(description))
sqlCmdList.append(sqlTxt)
print("{0}.{1}".format(fullTableName, columnName))
#print(sqlCmdList)
for sqlTxt in sqlCmdList:
sql(sqlTxt)
This workflow allows teams to curate once in Alation and propagate meaningful descriptions to Databricks continuously. This process flow ensures users across systems have the same information available.
Governance & safety tips:
Teams should review each step carefully and adapt it to meet their requirements.
Always test in a non-production workspace first.
Review descriptions before propagation (what’s public in Alation will appear in Databricks).
Ensure you have the right permissions in Databricks and that Databricks audit logging is enabled to track any COMMENT ON write operations.
Start small and test it in your environment, see the impact, and imagine what’s possible when your catalog and your workspace finally speak the same language.
By letting Alation serve as your source of truth and Databricks as your system of action, your teams can collaborate in real time with shared context, consistent language, and complete trust in the data.
Curious to learn more?
Loading...