Accuracy Checks

Accuracy checks validate statistical properties of numeric columns to ensure values fall within expected ranges and meet business expectations. These checks perform mathematical calculations on your data to verify it meets quality standards.

Follow these best practices for accuracy checks:

  • Set meaningful thresholds based on historical patterns in your data.

  • Configure both Minimum and Maximum checks where applicable to catch anomalies in both directions.

  • Use standard deviation checks to identify statistical outliers.

  • Consider seasonality when setting thresholds for business metrics.

  • Update thresholds periodically as business conditions evolve.

Available Metrics

  • Minimum / Maximum: Validates value boundaries within acceptable ranges.

  • Average: Ensures mean values meet business expectations.

  • Sum: Validates aggregate totals for financial or quantity data.

  • Standard Deviation: Monitors data distribution consistency.

  • Variance: Tracks statistical variance within acceptable bounds.

  • Percentile: Validates specific percentile values.

  • Row Count: Ensures dataset size meets expectations.

Metric

Description

Supported Data Types

Database Support

avg

The average value in a numeric column

number

all

max

The greatest value in a numeric column

number

all

min

The smallest value in a numeric column

number

all

percentile

A percentile check evaluates whether a value in a dataset falls below or above a certain percentile threshold within a group of observations. Example: percentile(distance, 0.7) returns the value below which 70% of the distance values fall, and can be used to detect outliers or assess distribution skew.

number

PostgreSQL, Snowflake

stddev

The calculated standard deviation of values in a numeric column

number

BigQuery, PostgreSQL, Redshift, Snowflake

stddev_pop

The calculated population standard deviation of values in a numeric column

number

BigQuery, PostgreSQL, Redshift, Snowflake

stddev_samp

The calculated sample standard deviation of values in a numeric column

number

BigQuery, PostgreSQL, Redshift, Snowflake

sum

The calculated sum of the values in a numeric column

number

all

variance

The calculated variance of the values in a numeric column

number, time

BigQuery, PostgreSQL, Snowflake

var_pop

The calculated population variance of the values in a numeric column

number, time

BigQuery, PostgreSQL, Redshift, Snowflake

var_samp

The calculated sample variance of the values in a numeric column

number, time

BigQuery, PostgreSQL, Redshift, Snowflake

Row_count (Table only)

The number of rows in a dataset or column, if specified

number, text, time

all

Configuration Examples

Check Description

Configuration Steps

Example Check

Ensure product prices are positive

  1. Select price column.

  2. Choose Min check type.

  3. Select >= operator.

  4. Set threshold to 0.

min(price) >= 0

Validate reasonable customer ages

  1. Select age column.

  2. Choose Max check type.

  3. Select <= operator.

  4. Set threshold to 120.

max(age) <= 120

Ensure average order value is within normal range

  1. Select order_total column.

  2. Choose Average check type.

  3. Select between operator.

  4. Set threshold to 50 and 500.

avg(order_total) between 50 and 500

Common Accuracy Check Scenarios

Business Requirement

Metric Type

Example Configuration

Use Case

Financial Data Validation

min, max, sum

min(revenue) >= 0, max(discount_percent) <= 100

Ensure financial values are within realistic bounds

Inventory Management

avg, row_count

avg(stock_level) between 10 and 1000, row_count > 0

Monitor stock levels and data availability

Performance Metrics

percentile, stddev

percentile(response_time, 0.95) < 5000

Track system performance thresholds

Data Distribution Analysis

variance, stddev_pop

variance(order_amount) between 100 and 10000

Monitor data consistency and outliers

Text Content Validation

avg_length, max_length

avg_length(description) >= 20, max_length(title) <= 100

Ensure content meets quality standards

Duplicate Detection

duplicate_count, duplicate_percent

duplicate_count(customer_id) = 0

Maintain data uniqueness

Advanced Statistical Checks

Standard Deviation Checks

  • Use stddev for general population standard deviation.

  • Use stddev_pop when working with complete populations.

  • Use stddev_samp when working with sample data.

Variance Checks

  • Use variance for general variance calculations.

  • Use var_pop for population variance.

  • Use var_samp for sample variance.

Percentile Checks

Useful for identifying outliers and understanding data distribution.

  • percentile(column, 0.5) calculates the median value.

  • percentile(column, 0.95) identifies the 95th percentile.

Change-Over-Time Thresholds

Accuracy checks support dynamic thresholds that compare current values to historical measurements.

Basic Change Detection

-- Alert if row count changes by more than 50 rows from previous measurement
change for row_count between -20 and +50

-- Alert if average changes by more than 10% from previous week
change same day last week for avg(order_amount) > 10%

Advanced Historical Comparisons

-- Compare to average of last 7 measurements
change avg last 7 for row_count < 50

-- Compare to minimum of last 7 measurements
change min last 7 for duplicate_count(email) < 5

-- Compare percentage change to maximum of last 7 measurements
change max last 7 percent for sum(revenue) < 20%

Change-Over-Time Components

  • Calculation Type (optional): avg, min, max.

  • Historical Period (optional): last 7.

  • Percentage Mode (optional): percent.

  • Metric: The numeric metric to calculate.

  • Comparison: The threshold for acceptable change.