Profile Data on Tables and Columns in Data Quality

Alation Cloud Service Applies to Alation Cloud Service instances of Alation

Data profiling allows you to analyze the characteristics, distributions, and quality of your data directly within the check configuration process. Profiling helps you understand your data before setting appropriate thresholds and validation rules.

Note

Service account credentials for the data source are used to run data profiling.

When setting up a check, you can profile selected tables or specific columns. The profiling results provide key insights, including:

  • Distribution statistics

  • Data type patterns

  • Null value percentages

  • Uniqueness metrics

  • Value ranges

Using this information, you can configure more effective and accurate data quality checks.

Run Profiling on Data

You can profile data when you create a monitor during column selection and configuring checks.

This is an optional step during the monitor creation and is available for you to analyze the characteristics of the data. For more information, see Add a Monitor. You can also run data profiling after the monitor creation through Monitors tab. For more information, see Manage a Monitor.

  1. During monitor setup and column selection, click Run Profile to use data profiling feature.

  2. In Run Profile section, select a profiling depth:

    • Quick Profile (Shallow): Offers rapid insights by profiling a representative sample of 10,000 rows, minimizing processing overhead.

    • Full Profile (Deep): Provides the most complete assessment of data quality by conducting a comprehensive scan across all rows in the table.

  3. You can either select Profile All Tables or Select Specific Tables to run data profiling.

  4. Click Profile Data.

  5. When data profiling completes, you can view the following:

    • Depending upon the column attributes fields could be: Duplicate Count, Null Count, Null Percentage, Total Rows, Unique Count, Unique Percentage, Average Length, Maximum Length, Minimum Length, Average Seconds Delta, Max Date, Min Date, and so on.

    • Active checks: Displays the type of checks, check definition, status, and its last run details.