Validity Checks¶
Validity checks in Alation Data Quality are used to detect invalid or unexpected values in datasets. These metrics help ensure data conforms to expected formats, ranges, patterns, and business rules by validating individual column values against predefined criteria.
These checks are applied at the column level to identify non-conforming data and is essential for data type validation, format verification, and business rule compliance. The checks returns Pass or Fail status based on whether values meet validation criteria.
Follow these best practices for validity checks:
For zero tolerance, use
= 0
for critical fields requiring 100% validity.For tolerance-based, use
<= 5%
for fields where some invalid data is acceptable.For large datasets, prefer percentage-based thresholds over counts.
Optimize complex patterns for large datasets.
Test regex patterns on sample data before production use to avoid regex errors.
Use built-in formats when possible instead of custom regex.
Consider data volume when setting thresholds.
Start with common validation patterns before creating custom rules.
Allow flexibility for international formats (e.g. phone numbers or addresses).
Update validation rules as business requirements evolve.
Ensure column data type supports chosen validation. This avoids format mismatches.
Review and adjust validation rules based on real data patterns to avoid false positives.
Available Metrics¶
Metric |
Purpose |
Returns |
Best Used For |
---|---|---|---|
invalid_count |
Count absolute number of invalid values |
Integer |
Small datasets, zero-tolerance scenarios |
invalid_percent |
Calculate percentage of invalid values |
Percentage |
Large datasets, tolerance-based validation |
Metric List¶
Metric |
Description |
Supported Data Types |
Column Configuration Keys |
---|---|---|---|
invalid_count |
The number of rows in a column that contain values that are not valid |
number, text, time |
invalid format, invalid values, valid format, valid length, valid max, valid max length, valid min, valid min length, valid values |
invalid_count (text only) |
Text-specific invalid count with regex support |
text |
invalid regex, valid regex |
invalid_percent |
The percentage of rows in a column, relative to the total row count, that contain values that are not valid |
number, text, time |
invalid format, invalid values, valid format, valid length, valid max, valid max length, valid min, valid min length, valid values |
invalid_percent (text only) |
Text-specific invalid percentage with regex support |
text |
invalid regex, valid regex |
Configuration Keys¶
Configuration Key |
Description |
Expected Value Type |
---|---|---|
invalid format |
Defines the format of a value that Alation ought to register as invalid. Only works with columns that contain data type TEXT. |
Format name from valid formats list |
invalid regex |
Specifies a regular expression to define your own custom invalid values. |
regex, no forward slash delimiters |
invalid values |
Specifies the values that Alation ought to consider invalid. |
values in a list |
valid format |
Defines the format of a value that Alation ought to register as valid. Only works with columns that contain data type TEXT. |
Format name from valid formats list |
valid length |
Specifies a valid length for a string. Works with columns that contain data type TEXT, and also with INTEGER on most databases, where implicit casting from string to integer is supported. |
integer |
valid max |
Specifies a maximum numerical value for valid values. |
integer or float |
valid max length |
Specifies a valid maximum length for a string. Only works with columns that contain data type TEXT. |
integer |
valid min |
Specifies a minimum numerical value for valid values. |
integer or float |
valid min length |
Specifies a valid minimum length for a string. Only works with columns that contain data type TEXT. |
integer |
valid regex |
Specifies a regular expression to define your own custom valid values. |
regex, no forward slash delimiters |
valid values |
Specifies the values that Alation ought to consider valid. |
values in a list |
Valid formats¶
Valid formats apply only to columns using data type TEXT, not DATE or NUMBER.
Format Name |
Description |
Example |
---|---|---|
credit card number |
Four four-digit numbers separated by spaces, dashes, or sixteen-digit number |
1234 5678 9012 3456, 1234-5678-9012-3456 |
date eu |
European date format (validates date only, not time) |
dd/mm/yyyy |
date inverse |
Inverse date format (validates date only, not time) |
yyyy/mm/dd |
date iso 8601 |
ISO 8601 date and/or time format |
2021-04-28T09:00:00+02:00 |
date us |
US date format (validates date only, not time) |
mm/dd/yyyy |
decimal |
Number uses a , or . as a decimal indicator |
123.45, 123,45 |
decimal comma |
Number uses , as decimal indicator |
123,45 |
decimal point |
Number uses . as decimal indicator |
123.45 |
Standard email address format |
||
integer |
Whole number |
123 |
ip address |
IPv4 address format |
192.168.1.1 |
ipv4 address |
IPv4 address format |
192.168.1.1 |
ipv6 address |
IPv6 address format |
2001:0db8:85a3:0000:0000:8a2e:0370:7334 |
money |
Money pattern with currency symbol + decimal + currency abbreviation |
$123.45 USD |
money comma |
Money pattern with currency symbol + decimal comma + currency abbreviation |
€123,45 EUR |
money point |
Money pattern with currency symbol + decimal point + currency abbreviation |
$123.45 USD |
negative decimal |
Negative number with decimal indicator |
-123.45, -123,45 |
negative decimal comma |
Negative number with comma as decimal indicator |
-123,45 |
negative decimal point |
Negative number with point as decimal indicator |
-123.45 |
negative integer |
Negative whole number |
-123 |
negative percentage |
Negative percentage |
-50% |
negative percentage comma |
Negative percentage with comma decimal indicator |
-50,5% |
negative percentage point |
Negative percentage with point decimal indicator |
-50.5% |
percentage |
Percentage value |
50% |
percentage comma |
Percentage with comma decimal indicator |
50,5% |
percentage point |
Percentage with point decimal indicator |
50.5% |
phone number |
Various phone number formats |
+12 123 123 1234, 123 123 1234, +1 123-123-1234, 555-2368 |
positive decimal |
Positive number with decimal indicator |
123.45, 123,45 |
positive decimal comma |
Positive number with comma as decimal indicator |
123,45 |
positive decimal point |
Positive number with point as decimal indicator |
123.45 |
positive integer |
Positive whole number |
123 |
positive percentage |
Positive percentage |
50% |
positive percentage comma |
Positive percentage with comma decimal indicator |
50,5% |
positive percentage point |
Positive percentage with point decimal indicator |
50.5% |
time 12h |
12-hour clock format with seconds |
hh:mm:ss AM/PM |
time 12h nosec |
12-hour clock format without seconds |
hh:mm AM/PM |
time 24h |
24-hour clock format with seconds |
hh:mm:ss |
time 24h nosec |
24-hour clock format without seconds |
hh:mm |
timestamp 12h |
12-hour clock timestamp |
hh:mm:ss AM/PM |
timestamp 24h |
24-hour clock timestamp |
hh:mm:ss |
uuid |
Universally unique identifier |
550e8400-e29b-41d4-a716-446655440000 |
Common Use Cases¶
Scenario |
Configuration Approach |
Example |
---|---|---|
Email Validation |
Use built-in email format |
valid format: |
Phone Numbers |
Use phone format or custom regex |
valid format: |
Status Fields |
Define allowed values list |
valid values: |
ID Formats |
Use regex patterns |
valid regex: |
Numerical Ranges |
Set min/max boundaries |
valid min: |
Text Length |
Control string length |
valid min length: |
Common Validity Check Types¶
Format-Based Validation¶
Use predefined formats for common data patterns:
invalid_count(email_address) = 0: valid format: email
Value-Based Validation¶
Specify allowed/disallowed values.
invalid_count(status) = 0: valid values: [ACTIVE, INACTIVE, PENDING]
Pattern-Based Validation¶
Use regex for custom patterns.
invalid_count(product_code) = 0: valid regex: '^PROD-[0-9]{4}$'
Range-Based Validation¶
Set numerical or length boundaries.
invalid_count(age) = 0: valid min: 0 valid max: 120
Integration with Other Checks¶
Combine with Completeness Checks: This ensures fields are both present and valid.
Ensure emails exist and are properly formatted,
valid format: email
:missing_count(email) = 0
andinvalid_count(email) = 0:
Layer with Uniqueness Checks: This validates format and uniqueness.
Customer IDs must be unique and follow pattern,
valid regex: ^CUST-[0-9]{8}$
:duplicate_count(customer_id) = 0
andinvalid_count(customer_id) = 0: