Run SQL Evaluations in Central Evaluations¶

Alation Cloud Service Applies to Alation Cloud Service instances of Alation

SQL evaluations let you test the accuracy of data product chat responses using predefined evaluation questions.

When you design a data product, you build it around a specific business use case. You can anticipate the types of questions users are likely to ask. To validate that the data product answers those questions correctly, you:

Create evaluation questions.
Provide the expected SQL for each question.
Run an evaluation to compare the AI-generated SQL and results with the expected output.

Based on the outcome, you may:

Refine the evaluation question wording.
Adjust the expected SQL.
Modify the agent configuration (if you are using a custom-built agent in Agent Studio).

As a Data Product Admin, you can evaluate chat accuracy for one data product at a time. This method is appropriate when you want to validate all evaluation questions for one data product as a holistic check. For more information, see Evaluate Data Product Chat.

You can also evaluate chat accuracy across multiple data products that you manage by using Central Evaluations in the Data Products Marketplace.

Note

The Central Evaluations page is available if the Chat with Data Products feature is enabled on your instance. The capabilities are available to all users who are Data Product Admins.

Central Evaluations allows you to:

Select a subset of evaluation questions across all the data products for which you are an admin.
Choose which AI agent to run the evaluation against.

Only AI agents that have the SQL Execution tool attached are available for selection.

This capability is important when:

Multiple data products depend on the same underlying data source.
An upstream schema or ETL change may affect several data products.
You want to regression-test a specific subset of questions (for example, questions related to a particular table or subject area).
You want to compare how different AI agents perform against the same evaluation set. By selecting a specific AI agent, you can confirm that the underlying AI agent is correctly accessing, generating SQL for, and interpreting the data associated with your data products.

In this topic:

Access Central Evaluations
Create a Central Evaluation Run
Review Evaluation Results
Rerun a Central Evaluation Run
Delete a Central Evaluation Run

Access Central Evaluations¶

Open the Data Products App.
Navigate to Manage > Central Evaluations. You can see previous evaluation runs and start new ones.

During an evaluation run:

The system executes the expected SQL saved on the evaluation question in a data product and captures the resulting dataset.
The selected AI agent generates and executes its own SQL for the same question.
The resulting datasets are compared.
The evaluation passes if the generated result matches the expected result set. It fails if the result sets differ.

Create a Central Evaluation Run¶

To create an evaluation run:

On the Central Evaluations page, click Create eval run on the top right. This opens the Create Eval Run wizard:

Step 1: Select Agent
Step 2: Select Questions
Step 3: Eval Details
Step 4: Validate and Confirm

Step 1: Select Agent¶

Select an AI agent. The list displays AI agents that have the SQL Execution tool attached.

Note

When you select an agent, a read-only information box on the right displays the input fields associated with that agent.
Click Continue.

Step 2: Select Questions¶

You see the list of evaluation questions from all data products you’re the Data Product Admin of. Use filtering and selection controls as needed:
- Filter by Data Product
- Search by case input
Select the evaluation questions to include in this run by checking their corresponding checkboxes.
Click Continue.

Step 3: Eval Details¶

Configure any Input Fields for Native Agents required by the agent. The ones that require a value have the Enter value hint.

Note

The message and data_product_id parameters are populated automatically. Other parameters, if any are present from the agent configuration, can be manually populated for the entirety of the run.
Click Continue.

Input Fields for Native Agents¶

When running a Central Evaluation with a native SQL-capable agent, you may see the following input fields. These fields correspond to parameters required by the agent.

message¶

The evaluation question text that is sent to the agent. This is automatically populated from the selected evaluation case.

data_product_id¶

The unique identifier of the data product associated with the evaluation question.

To find it:

From the Data Products App menu, open the Manage My Data Products page.
In the Data Products table, click the name of the relevant data product.
Look at the URL: the id is included there, for example, in a URL like: https://my_catalog.alationcloud.com/app/manage/data-product/a21624ac/overview, the part a21624ac is the data product ID.

Alternatively, you can find it in the data product’s YAML specification:

From the Data Products App menu, open the Manage My Data Products page.
Click Edit for the relevant data product.
Click the Edit YAML toggle.
Locate the value of the productId property.

data_product_version¶

The version number of the data product configuration used for the evaluation. This ensures the evaluation runs against the correct version of the data product.

marketplace_id¶

The unique identifier of the Marketplace where the data product is published.

To find it:

Open the Marketplace page in your browser.
Locate the ID in the URL, for example: https://my_catalog.alationcloud.com/app/marketplace/acme-market-1. In this example, the marketplace_id is acme-market-1.

auth_id¶

The authentication ID used to connect to the underlying data source during evaluation.

To find it:

In the Data Products App, open Manage Authentication.
Locate the connection used by the data product on which the evaluation question exists.
Copy the authentication ID associated with that connection.

The evaluation run uses this authentication context to execute SQL against the underlying data source.

Step 4: Validate and Confirm¶

Review the pre-flight authorization check results.
If the check fails:
- Click the data product name to open the data product details page.
- Configure the chat authentication, then return and run the evaluation.
Click Run eval. When you submit the run:
- A new evaluation entry is created on the Central Evaluations page.
- The evaluation appears in the list of runs.
- The initial status is Running.
When processing is complete, the status changes to Completed.

Review Evaluation Results¶

You can review run-level outcomes and case-level pass/fail reasons.

You can see evaluation runs only for data products where you are an admin.
If a run includes multiple data products, you must be an admin on all of them to see the run.

To open detailed results:

Select a run from the run history list on Central Evaluations.
Click on the run’s name or Status to open the run details.
The results page includes:
Number of cases (evaluation questions)

Pass rate

Average time to complete the evaluation per case.
Each case includes:
Pass/fail status

Execution time

Click on the status (Pass/Fail) to open the run output details. This shows the reasoning details and data samples.

Review Data Samples¶

You can review table samples used in the evaluation context.

In an evaluation run’s detail view, in the tab set, select Data Preview (next to SQL).

Rerun a Central Evaluation Run¶

You can re-execute an evaluation quickly using the prior run’s configuration:

On the Central Evaluations page, hover over the relevant evaluation run. This shows the three dots icon in the rightmost column.
Click the three dots icon and select Run again.

Delete a Central Evaluation Run¶

You can remove the information about an evaluation run you don’t need.

On the Central Evaluations page, hover over the relevant evaluation run. This shows the three dots icon in the rightmost column.
Click the three dots icon and select Delete Run.

Run SQL Evaluations in Central Evaluations¶

Access Central Evaluations¶

Create a Central Evaluation Run¶

Step 1: Select Agent¶

Step 2: Select Questions¶

Step 3: Eval Details¶

Input Fields for Native Agents¶

message¶

data_product_id¶

data_product_version¶

marketplace_id¶

auth_id¶

Step 4: Validate and Confirm¶

Review Evaluation Results¶

Review Data Samples¶

Rerun a Central Evaluation Run¶

Delete a Central Evaluation Run¶

Alation User Documentation PDFs