Run SQL Evaluations in Central Evaluations¶
Alation Cloud Service Applies to Alation Cloud Service instances of Alation
SQL evaluations let you test the accuracy of data product chat responses using predefined evaluation questions.
When you design a data product, you build it around a specific business use case. You can anticipate the types of questions users are likely to ask. To validate that the data product answers those questions correctly, you:
Create evaluation questions.
Provide the expected SQL for each question.
Run an evaluation to compare the AI-generated SQL and results with the expected output.
Based on the outcome, you may:
Refine the evaluation question wording.
Adjust the expected SQL.
Modify the agent configuration (if you are using a custom-built agent in Agent Studio).
As a Data Product Admin, you can evaluate chat accuracy for one data product at a time. This method is appropriate when you want to validate all evaluation questions for one data product as a holistic check. For more information, see Evaluate Data Product Chat.
You can also evaluate chat accuracy across multiple data products that you manage by using Central Evaluations in the Data Products Marketplace.
Note
The Central Evaluations page is available if the Chat with Data Products feature is enabled on your instance. The capabilities are available to all users who are Data Product Admins.
Central Evaluations allows you to:
Select a subset of evaluation questions across all the data products for which you are an admin.
Choose which AI agent to run the evaluation against.
Only AI agents that have the SQL Execution tool attached are available for selection.
This capability is important when:
Multiple data products depend on the same underlying data source.
An upstream schema or ETL change may affect several data products.
You want to regression-test a specific subset of questions (for example, questions related to a particular table or subject area).
You want to compare how different AI agents perform against the same evaluation set. By selecting a specific AI agent, you can confirm that the underlying AI agent is correctly accessing, generating SQL for, and interpreting the data associated with your data products.
In this topic:
Access Central Evaluations¶
Navigate to Manage > Central Evaluations. You can see previous evaluation runs and start new ones.
During an evaluation run:
The system executes the expected SQL saved on the evaluation question in a data product and captures the resulting dataset.
The selected AI agent generates and executes its own SQL for the same question.
The resulting datasets are compared.
The evaluation passes if the generated result matches the expected result set. It fails if the result sets differ.
Create a Central Evaluation Run¶
To create an evaluation run:
On the Central Evaluations page, click Create eval run on the top right. This opens the Create Eval Run wizard:
Step 1: Select Agent¶
Select an AI agent. The list displays AI agents that have the SQL Execution tool attached.
Note
When you select an agent, a read-only information box on the right displays the input fields associated with that agent.
Click Continue.
Step 2: Select Questions¶
You see the list of evaluation questions from all data products you’re the Data Product Admin of. Use filtering and selection controls as needed:
Filter by Data Product
Search by case input
Select the evaluation questions to include in this run by checking their corresponding checkboxes.
Click Continue.
Step 3: Eval Details¶
Configure any Input Fields for Native Agents required by the agent. The ones that require a value have the Enter value hint.
Note
The
messageanddata_product_idparameters are populated automatically. Other parameters, if any are present from the agent configuration, can be manually populated for the entirety of the run.Click Continue.
Input Fields for Native Agents¶
When running a Central Evaluation with a native SQL-capable agent, you may see the following input fields. These fields correspond to parameters required by the agent.
message¶
The evaluation question text that is sent to the agent. This is automatically populated from the selected evaluation case.
data_product_id¶
The unique identifier of the data product associated with the evaluation question.
To find it:
From the Data Products App menu, open the Manage My Data Products page.
In the Data Products table, click the name of the relevant data product.
Look at the URL: the id is included there, for example, in a URL like:
https://my_catalog.alationcloud.com/app/manage/data-product/a21624ac/overview, the parta21624acis the data product ID.
Alternatively, you can find it in the data product’s YAML specification:
From the Data Products App menu, open the Manage My Data Products page.
Click Edit for the relevant data product.
Click the Edit YAML toggle.
Locate the value of the
productIdproperty.
data_product_version¶
The version number of the data product configuration used for the evaluation. This ensures the evaluation runs against the correct version of the data product.
marketplace_id¶
The unique identifier of the Marketplace where the data product is published.
To find it:
Open the Marketplace page in your browser.
Locate the ID in the URL, for example:
https://my_catalog.alationcloud.com/app/marketplace/acme-market-1. In this example, themarketplace_idisacme-market-1.
auth_id¶
The authentication ID used to connect to the underlying data source during evaluation.
To find it:
In the Data Products App, open Manage Authentication.
Locate the connection used by the data product on which the evaluation question exists.
Copy the authentication ID associated with that connection.
The evaluation run uses this authentication context to execute SQL against the underlying data source.
Step 4: Validate and Confirm¶
Review the pre-flight authorization check results.
If the check fails:
Click the data product name to open the data product details page.
Configure the chat authentication, then return and run the evaluation.
Click Run eval. When you submit the run:
A new evaluation entry is created on the Central Evaluations page.
The evaluation appears in the list of runs.
The initial status is Running.
When processing is complete, the status changes to Completed.
Review Evaluation Results¶
You can review run-level outcomes and case-level pass/fail reasons.
You can see evaluation runs only for data products where you are an admin.
If a run includes multiple data products, you must be an admin on all of them to see the run.
To open detailed results:
Select a run from the run history list on Central Evaluations.
Click on the run’s name or Status to open the run details.
The results page includes:
Number of cases (evaluation questions)
Pass rate
Average time to complete the evaluation per case.
Each case includes:
Pass/fail status
Execution time
Click on the status (Pass/Fail) to open the run output details. This shows the reasoning details and data samples.
Review Data Samples¶
You can review table samples used in the evaluation context.
In an evaluation run’s detail view, in the tab set, select Data Preview (next to SQL).
Rerun a Central Evaluation Run¶
You can re-execute an evaluation quickly using the prior run’s configuration:
On the Central Evaluations page, hover over the relevant evaluation run. This shows the three dots icon in the rightmost column.
Click the three dots icon and select Run again.
Delete a Central Evaluation Run¶
You can remove the information about an evaluation run you don’t need.
On the Central Evaluations page, hover over the relevant evaluation run. This shows the three dots icon in the rightmost column.
Click the three dots icon and select Delete Run.