Profile Data on Tables and Columns in Data Quality¶
Alation Cloud Service Applies to Alation Cloud Service instances of Alation
Data profiling allows you to analyze the characteristics, distributions, and quality of your data directly within the check configuration process. Profiling helps you understand your data before setting appropriate thresholds and validation rules.
Note
Service account credentials for the data source are used to run data profiling.
When setting up a check, you can profile selected tables or specific columns. The profiling results provide key insights, including:
Distribution statistics
Data type patterns
Null value percentages
Uniqueness metrics
Value ranges
Using this information, you can configure more effective and accurate data quality checks.
Run Profiling on Data¶
You can profile data when you create a monitor or after the monitor is created. This optional step helps you understand your data characteristics before setting check thresholds.
For more information on creating monitors, see Add a Monitor. To run data profiling on an existing monitor, see Manage a Monitor.
To run data profiling during monitor creation:
In the Configure Checks step, click Profile data in the header.
In the Select Table dropdown, choose a table from your selected scope.
In the Columns picker, select one or more columns to profile.
Select a scope option:
Full scan: Profiles all rows in the table for comprehensive analysis.
Row limit: Profiles a sample of rows. Enter the maximum number in the Max rows to sample field. Use this option for faster profiling on large tables.
Date range: Profiles data within a time period. Select the number of days and the timestamp column for filtering. Use this option to focus on recent data.
Column filter: Profiles rows matching specific criteria. Select a column, choose a condition operator based on the column data type, and enter a value. A preview of the WHERE clause appears at the bottom.
Click Run Profile.
Review the profiling results:
A status banner shows whether profiling succeeded or failed, along with the table name, scope, and column count.
Select a column from the list to view its statistics. Each column displays a data type badge.
Column statistics include:
Field Type: The data type of the column.
Total Rows: The number of rows analyzed.
Null Count: The number of null values.
Null %: The percentage of null values.
Duplicate Count: The number of duplicate values.
Unique Count: The number of distinct values.
Unique %: The percentage of unique values.
For numeric columns: Min, Max, and Mean values.
For datetime columns: Min Date and Max Date values.
The Active Checks section displays existing checks on the column. Each check card shows the category, description, monitor, and status.
The Value Distribution section shows distinct values and their frequencies. Null values appear labeled as “Null”.
To run a new profile with different settings, click the re-run icon in the status banner.
To close the panel, click Profile data in the header.