Reconciliation Checks

Alation Cloud Service Applies to Alation Cloud Service instances of Alation

Reconciliation checks compare data between two tables, typically a source (such as a transactional database) and a target (such as a data warehouse), to confirm both contain the same information. Unlike regular data quality checks, which validate the health of a single table, reconciliation checks validate that two tables are consistent with each other.

Note

Reconciliation checks are limited to SQL Server, Oracle, Snowflake and Databricks only.

Use reconciliation checks when data is copied, replicated, or transformed between systems and you need confidence that nothing was lost, duplicated, or silently changed along the way.

Reconciliation checks are classified under the consistency data quality dimension. They validate that data remains consistent across systems by confirming source and target tables contain the same information.

Best Practices

Keep the following best practices in mind when configuring reconciliation checks:

  • Start with metric reconciliation on row count as a fast, low-cost sanity check before running deeper checks.

  • Run schema reconciliation before content reconciliation. A missing or mismatched column on one side makes row-level comparison unreliable.

  • Use source and target filters to restrict the comparison to a specific data slice, for example yesterday’s data, when working with large tables or near-real-time pipelines.

  • Schedule content reconciliation checks less frequently than metric checks. Content reconciliation reads both full tables and is more resource-intensive on very large datasets.

  • A passing check means the result is within your configured tolerance, not byte-identical. Set thresholds that reflect your actual business requirements.

Common Use Cases

The following table shows typical scenarios where reconciliation checks provide business value, along with the recommended check type and configuration approach for each.

Scenario

Recommended Check Type

Configuration Approach

Validate row counts after a data migration to confirm no records were lost

Metric reconciliation on row_count

Use Percent difference type with a threshold of 0.1 to allow a 0.1% tolerance.

Confirm schema parity between source and target before a system cutover

Schema reconciliation

Use zero tolerance. The check passes only when all columns match.

Detect dropped or corrupted records introduced by an ETL pipeline

Content reconciliation

Use Percent difference type with a threshold of 1 and a time-based filter to scope the scan to recent data.

Validate that financial aggregates such as revenue totals match across systems

Metric reconciliation on sum

Use Percent difference type with a threshold of 0.01 for tight financial tolerance.

Audit PII or sensitive column completeness across environments

Metric reconciliation on missing_count

Use Absolute difference type with a threshold of 0 for zero tolerance on missing values.

Check Types

Alation Data Quality supports the following reconciliation check types. You can combine more than one type within a single source-target pairing.

Check Type

What It Compares

Best For

Metric reconciliation

A single aggregate value (row count, sum, average, min, max, or custom SQL) on both sides.

High-level smoke tests, daily replication sanity checks, and financial total verification.

Schema reconciliation

Column names and data types between the two tables. The result is pass or fail only: either the structure lines up or it does not.

Detecting schema drift before it breaks downstream pipelines or BI reports.

Content reconciliation

Actual row contents between source and target, using a bucket-based hash algorithm.

Post-migration validation, compliance and audit requirements, and confirming pipeline output matches the source.

Add a Reconciliation Check

Reconciliation checks are configured within the standard monitor creation workflow.

Note

Reconciliation checks require both the source and target data sources to be cataloged and configured in Alation with appropriate service account credentials and SELECT privileges on all tables being compared. All raw data remains within each respective database.

  1. Navigate to the Alation Data Quality application or catalog table page and click Add Monitor.

  2. In the Select Scope section, choose the source data source and table.

  3. In the Configure Checks section, click Add Check > Reconciliation.

  4. In the Reconciliation Target panel, select the target data source from the Target Data Source dropdown.

  5. Select the target table from the Target Table dropdown.

  6. (Optional) Enter a Source Filter and a Target Filter to restrict the comparison to a specific data slice, for example created_at >= '2026-01-01'. Filters apply to all reconciliation checks configured within this source-target pairing. Adding filters is recommended to avoid running queries against the full table, which can be expensive on large datasets.

  7. Configure one or more check types as described in the following steps.

    To add a metric reconciliation check:

    1. Click Metric > Add Metric Check.

    2. Select a Metric: Row Count, Sum, Average, Min, Max, or Custom SQL.

    3. For column-level metrics (Sum, Average, Min, Max), select a source column and its corresponding target column.

    4. For Custom SQL, enter a Source SQL expression and a Target SQL expression. Each expression must return a single numeric value.

    5. Select a comparison Operator (=, <, <=, >, >=) and enter a Threshold value.

    6. Select a Difference Type: Absolute (the raw numeric difference between the two sides) or Percent (the difference as a percentage of the source value).

    To add a schema reconciliation check:

    1. Click Schema > Add Schema Reconcilation.

    2. (Optional) In the Column Mapping table, add entries only for columns whose names differ between source and target. For example, if name on the source is called full_name on the target. Columns with identical names are matched automatically. All columns present in either table are compared regardless of whether a mapping entry exists for them.

    3. (Optional) In the Type Equivalence section, add rules for type names that should be treated as equivalent across databases. For example, bit = bool for SQL Server-to-Snowflake comparisons.

    To add a content reconciliation check:

    1. Click Content.

    2. In Key Columns, select the columns that uniquely identify each row, for example a primary key. Map each source column to its corresponding target column. This field is required.

    3. In Comparison Columns, select the columns whose values will be compared between source and target. Only the values in these columns are evaluated, not the entire row. Map each source column to its corresponding target column. This field is required.

    4. Select a comparison Operator and enter a Threshold to define the acceptable level of drift. For example, less than 1 percent of rows may differ.

    5. Select a Difference Type: Absolute (number of differing rows) or Percent (proportion of differing rows).

  8. Click Save.

Note

You can add multiple check types within a single reconciliation target pairing. For example, you can run schema, metric, and content checks against the same two tables in one monitor. Each check type runs independently and reports its own pass or fail result.

Reading Results

After a monitor runs, you can review reconciliation check results in the Check Results tab. The following sections describe how to interpret each check type.

Metric Reconciliation Results

A metric reconciliation result is a single side-by-side comparison. The result shows the following information:

  • Source value: The metric measured on the source side.

  • Target value: The same metric measured on the target side.

  • Difference: How far apart the two values are, reported as either an absolute number or a percentage of the source value, depending on your configuration.

  • Threshold: Your configured tolerance. The check passes when the difference satisfies the operator.

Note

A 0% difference in row count is not the same as a 0% difference in a column sum. Row count confirms that the same number of rows arrived. Column sum confirms that the same values arrived. Two tables can have identical row counts while containing meaningfully different data.

Schema Reconciliation Results

A schema reconciliation result is a structural diff between the two tables. The result is organized into the following four categories:

  • Matched: Columns present on both sides with compatible types. A healthy result has all columns here.

  • Type mismatches: Columns with the same name on both sides but incompatible types. For example, amount is DECIMAL(18,2) on the source and VARCHAR(50) on the target. These are the most concerning finding because data still flows but values may be silently truncated, rounded, or coerced.

  • Missing in target: Columns present on the source that are absent on the target. This often indicates the target schema is behind and downstream consumers may not receive expected fields.

  • Missing in source: Columns present on the target that are absent on the source. This often indicates leftover columns from a prior schema or unrelated additions on the target side.

A clean schema reconciliation shows all columns under Matched with the remaining three categories empty. Anything outside Matched is worth investigating before relying on downstream content checks. A column that does not exist on one side cannot be reconciled at the row level.

Note

Equivalent types across databases, for example Snowflake NUMBER and SQL Server DECIMAL, or VARCHAR and TEXT, are treated as a match by default. Built-in equivalence rules prevent harmless cross-database naming differences from generating noise.

The following table shows the built-in type equivalences for SQL Server and Snowflake.

SQL Server to Snowflake Type Equivalences

SQL Server

Snowflake

Equivalent

INT

INTEGER / NUMBER(38,0)

Yes

BIGINT

BIGINT / NUMBER(38,0)

Yes

DECIMAL(p,s)

NUMBER(p,s)

Yes

VARCHAR(n)

VARCHAR(n)

Yes

NVARCHAR(n)

VARCHAR(n)

Yes

DATETIME / DATETIME2

TIMESTAMP_NTZ

Yes

DATE

DATE

Yes

BIT

BOOLEAN

Yes

FLOAT

FLOAT

Yes

TEXT / NTEXT

VARCHAR(16777216)

Yes

Content Reconciliation Results

A content reconciliation result reports two distinct kinds of findings. Understanding the difference is key to interpreting the result correctly.

Presence drift describes rows that exist on one side but not the other. These counts are exact. If the result reports 50 rows missing in target, exactly 50 rows are missing. Presence drift covers the following scenarios:

  • Missing in target: Rows present on the source but absent on the target. This indicates lost inserts, failed replication, or incomplete loads.

  • Extra in target: Rows present on the target but absent on the source. This indicates stale data, deletes that never propagated, or double writes.

Content drift describes rows that exist on both sides under the same key but have different values in one or more columns. To stay efficient at scale, the check identifies regions of the table where differences exist rather than inspecting every individual row. Content drift provides the following information:

  • Upper bound: No more than this many rows can possibly differ. This is the total number of rows across all affected regions of the table.

  • Drilldown SQL: A ready-to-run query scoped to just the affected regions. Copy and run it in your SQL client to see the actual differing rows without writing the query yourself.

The following table shows how the check interprets the per-bucket hash comparison results. Each row in the table represents a distinct pattern the check can detect, helping you identify whether a discrepancy is caused by missing rows, key changes, or value-level changes within existing rows.

Row Count Match

Key Hash Match

Row Hash Match

Diagnosis

Yes

Yes

Yes

Bucket is clean. No issues detected.

No

N/A

N/A

Presence issue: rows are missing or extra.

Yes

No

N/A

Presence issue: keys were swapped (a delete and insert occurred).

Yes

Yes

No

Content issue: the same rows exist but one or more values changed.

Every content reconciliation result includes a one-line headline summary. For example:

“90.04% of data is byte-identical. 55 sectors have row-count drift, 47 sectors have content drift.”

The headline percentage represents the proportion of table regions where source and target are fully identical at the row level. A region is a bucket of rows grouped by key hash. When a region is clean, every row within it has the same key and the same values on both sides. When a region has drift, the drilldown SQL is scoped to that region so you can inspect only the affected rows.

Use the headline percentage to gauge severity at a glance using the following guidance.

Percent Clean

Typical Interpretation

100%

Source and target are byte-identical.

99 to 99.99%

Trace drift. Usually in-flight rows, timezone edges, or rounding.

95 to 99%

Meaningful drift. Investigate before relying on the target.

Less than 95%

Significant disagreement. Treat the target as suspect until resolved.

Resolve a Failed Reconciliation Check

  1. Review presence drift first. Those counts are exact and usually point at a concrete pipeline problem such as a missed insert, a failed load, or a delete that never propagated to the target.

  2. Review content drift. The upper bound tells you the maximum number of rows that could be affected. If that count exceeds your tolerance, run the drilldown query to identify the actual differing rows.

  3. Run the drilldown query. The query is pre-generated and scoped to just the regions that disagree. Copy, paste, and run it in your SQL client to see the actual offending rows.

  4. Check the headline percentage. If the result shows 99.9% clean and your pipeline is near-real-time, the drift may represent rows that had not yet arrived when the check ran. Re-run the check or narrow the filter window before escalating.

Cross-Database Normalization

To produce consistent hash values across different database engines, all column values are normalized before hashing. Without this normalization, MD5 hashes would never match even for logically identical data.

The following table describes the normalization rules applied to each data type.

Data Type

Normalization Rule

Example

Strings

COALESCE(UPPER(TRIM(NULLIF(col, ''))), 'NULL')

' Hello ' becomes 'HELLO'

Numerics

CAST(ROUND(col, precision) AS VARCHAR)

100.099 becomes '100.10' at precision 2

Timestamps

Convert to UTC, format as ISO 8601

2026-03-16T08:00:00

Booleans

Map to 'Y' or 'N'

BIT(1) becomes 'Y'

NULLs

Replace with sentinel 'NULL'

NULL becomes 'NULL'

Empty strings

Treat as NULL

'' becomes 'NULL'

Normalized column values are concatenated with a | separator before hashing:

CONCAT(norm_key1, '|', norm_key2, '|', norm_val1, '|', norm_val2)

MD5 is used on both platforms. It is deterministic: the same input string produces the same 128-bit hash regardless of the database engine. The first 16 hex characters of the MD5 output are converted to a BIGINT for numeric aggregation.

Scale Guardrails

Alation Data Quality applies the following guardrails to reconciliation checks to protect query performance and warehouse compute costs.

Guardrail

Behavior

Query timeout

Configurable per run. The default is 60 minutes. The check fails cleanly on timeout.

Partition pruning

All generated SQL includes the filter clause you define. For ongoing monitoring of large tables, filter to recent data only using the source and target filter fields.

Early exit

If all buckets match in the first step of content reconciliation, no drilldown SQL is generated and the check completes immediately.

Concurrent run limit

A maximum of one reconciliation run per data source pair runs at a time.

Important Notes

  • Snapshots, not live views: A reconciliation result reflects the state of source and target at the moment the check ran. In near-real-time pipelines, a small amount of drift may represent in-flight data rather than a real problem. Re-run the check or add a time filter to confirm.

  • Filters apply to both sides independently: When a check is configured with a source filter and a target filter, each filter applies only to its respective side. Drift outside either filter window is not visible to this check by design.

  • NULLs are treated as values: Two rows with NULL in the same column are considered equal. A row with NULL on one side and an empty string ('') on the other is a mismatch.

  • Cross-database differences are normalized automatically: When source and target use different database technologies, the check normalizes boolean representations, timestamp timezones, decimal precision, and similar variations so that the same value stored differently does not appear as a mismatch. No additional configuration is required.

  • Cost scales with check type: Schema reconciliation is lightweight (a metadata query). Metric reconciliation runs one aggregate query per side. Content reconciliation reads both full tables and scales with data volume. Schedule content reconciliation less frequently on very large datasets.