Row-Level Filtering

Alation Cloud Service Applies to Alation Cloud Service instances of Alation

Alation Data Quality uses Soda Core filtering capabilities, allowing you to apply sophisticated conditions to subset data before validation. This enables more precise and targeted data quality checks by focusing on specific data segments.

Follow these best practices for row-level filtering:

  • Performance Considerations: Use indexed columns in filters when possible.

  • Data Distribution: Ensure filtered datasets are representative of quality expectations.

  • Documentation: Clearly document filter rationale for future maintenance.

  • Testing: Validate filter logic returns expected row counts before implementing checks.

  • Maintenance: Regularly review filters as data patterns and business rules evolve.

Common Use Cases

Scenario

Filter Example

Business Value

Active Records Only

status = 'ACTIVE'

Focus on operationally relevant data

Recent Data

created_date >= CURRENT_DATE - 30

Ensure timeliness validation

Geographic Segmentation

region IN ('US', 'EU')

Regional compliance requirements

Business Hours

EXTRACT(hour FROM timestamp) BETWEEN 9 AND 17

Operational period validation

Product Categories

category = 'ELECTRONICS'

Category-specific quality rules

Customer Tiers

customer_tier = 'PREMIUM'

Tier-based service level validation

Basic Filters

Use the following examples to apply basic filtering and focus checks on specific data subsets:

-- Validate completeness only for active customers

-- Filter:

    customer_status = 'ACTIVE'


-- Check uniqueness within specific date ranges

    duplicate_count(transaction_id) = 0

-- Filter:

    transaction_date >= '2024-01-01'

You can also combine multiple filters using logical operators (AND/OR):

-- Check data quality for recent, high-value transactions

    duplicate_count(order_id) = 0

-- Filters:

    transaction_date >= '2024-01-01' AND order_amount > 1000

-- Validate email completeness for active users in specific regions

    missing_count(email) = 0

-- Filters:

    status = 'ACTIVE' AND (region = 'US' OR region = 'EU')

IN and NOT IN Filters

Use the following examples to apply IN and NOT IN filtering:

-- Check completeness for specific product categories

    missing_count(description) = 0

-- Filter:

    category IN ('Electronics', 'Books', 'Clothing')

-- Exclude test data from validation

    duplicate_count(user_id) = 0

-- Filter:

    environment NOT IN ('test', 'staging')

NULL and NOT NULL Filters

Use the following examples to apply NULL and NOT NULL filtering:

-- Validate data only where certain fields are populated

    invalid_count(phone_number) = 0

-- Filter:

    customer_tier IS NOT NULL

-- Check consistency in records with missing optional fields

    missing_count(secondary_email) <= 10

-- Filter:

    primary_email IS NULL

Date and Time-Based Filters

Use the following examples to apply date and time-based filtering:

-- Validate recent data within business hours

    missing_count(transaction_amount) = 0

-- Filter:

    created_at >= CURRENT_DATE - INTERVAL '7 days' AND EXTRACT(hour FROM created_at) BETWEEN 9 AND 17

-- Check weekend data patterns

    avg(order_amount) between 50 and 200

-- Filter:

    EXTRACT(dow FROM order_date) IN (0, 6)  -- Sunday and Saturday

Pattern-Based Filters

Note

The following examples use the SQL pattern-matching operators LIKE and NOT LIKE. These operators may not be available in all Alation Data Quality releases. If your environment does not support them, adapt the filters to use the pattern-matching syntax that your data source supports.

Use the following examples to apply pattern-based filtering:

-- Validate data for specific identifier patterns

    duplicate_count(product_sku) = 0

-- Filter:

    product_sku LIKE 'PROD-%'

-- Check international phone number formats

    invalid_count(phone_number) = 0

-- Filter:

    phone_number LIKE '+%'

Nested Conditions and Complex Logic

Use the following examples to apply complex and nested filtering:

-- Complex business rule validation with multiple conditions

    missing_count(shipping_address) = 0

-- Filter:

    (order_type = 'PHYSICAL' AND shipping_method != 'PICKUP') OR (customer_tier = 'PREMIUM' AND order_amount > 500)