Row-Level Filtering¶
Alation Cloud Service Applies to Alation Cloud Service instances of Alation
Alation Data Quality uses Soda Core filtering capabilities, allowing you to apply sophisticated conditions to subset data before validation. This enables more precise and targeted data quality checks by focusing on specific data segments.
Follow these best practices for row-level filtering:
Performance Considerations: Use indexed columns in filters when possible.
Data Distribution: Ensure filtered datasets are representative of quality expectations.
Documentation: Clearly document filter rationale for future maintenance.
Testing: Validate filter logic returns expected row counts before implementing checks.
Maintenance: Regularly review filters as data patterns and business rules evolve.
Common Use Cases¶
Scenario |
Filter Example |
Business Value |
|---|---|---|
Active Records Only |
|
Focus on operationally relevant data |
Recent Data |
|
Ensure timeliness validation |
Geographic Segmentation |
|
Regional compliance requirements |
Business Hours |
|
Operational period validation |
Product Categories |
|
Category-specific quality rules |
Customer Tiers |
|
Tier-based service level validation |
Basic Filters¶
Use the following examples to apply basic filtering and focus checks on specific data subsets:
-- Validate completeness only for active customers
-- Filter:
customer_status = 'ACTIVE'
-- Check uniqueness within specific date ranges
duplicate_count(transaction_id) = 0
-- Filter:
transaction_date >= '2024-01-01'
You can also combine multiple filters using logical operators (AND/OR):
-- Check data quality for recent, high-value transactions
duplicate_count(order_id) = 0
-- Filters:
transaction_date >= '2024-01-01' AND order_amount > 1000
-- Validate email completeness for active users in specific regions
missing_count(email) = 0
-- Filters:
status = 'ACTIVE' AND (region = 'US' OR region = 'EU')
IN and NOT IN Filters¶
Use the following examples to apply IN and NOT IN filtering:
-- Check completeness for specific product categories
missing_count(description) = 0
-- Filter:
category IN ('Electronics', 'Books', 'Clothing')
-- Exclude test data from validation
duplicate_count(user_id) = 0
-- Filter:
environment NOT IN ('test', 'staging')
NULL and NOT NULL Filters¶
Use the following examples to apply NULL and NOT NULL filtering:
-- Validate data only where certain fields are populated
invalid_count(phone_number) = 0
-- Filter:
customer_tier IS NOT NULL
-- Check consistency in records with missing optional fields
missing_count(secondary_email) <= 10
-- Filter:
primary_email IS NULL
Date and Time-Based Filters¶
Use the following examples to apply date and time-based filtering:
-- Validate recent data within business hours
missing_count(transaction_amount) = 0
-- Filter:
created_at >= CURRENT_DATE - INTERVAL '7 days' AND EXTRACT(hour FROM created_at) BETWEEN 9 AND 17
-- Check weekend data patterns
avg(order_amount) between 50 and 200
-- Filter:
EXTRACT(dow FROM order_date) IN (0, 6) -- Sunday and Saturday
Pattern-Based Filters¶
Note
The following examples use the SQL pattern-matching operators LIKE and NOT LIKE. These operators may not be available in all Alation Data Quality releases. If your environment does not support them, adapt the filters to use the pattern-matching syntax that your data source supports.
Use the following examples to apply pattern-based filtering:
-- Validate data for specific identifier patterns
duplicate_count(product_sku) = 0
-- Filter:
product_sku LIKE 'PROD-%'
-- Check international phone number formats
invalid_count(phone_number) = 0
-- Filter:
phone_number LIKE '+%'
Nested Conditions and Complex Logic¶
Use the following examples to apply complex and nested filtering:
-- Complex business rule validation with multiple conditions
missing_count(shipping_address) = 0
-- Filter:
(order_type = 'PHYSICAL' AND shipping_method != 'PICKUP') OR (customer_tier = 'PREMIUM' AND order_amount > 500)