Overview
AI Metric Generation lets you automatically create evaluation metrics from your product’s specifications. Instead of manually crafting judge prompts and configuring evaluation parameters, the AI analyzes your specifications and generates ready-to-use metrics.Evaluation parameters are automatically selected based on each specification’s description and test type. The generated judge prompt follows a format optimized for reliable LLM-based evaluation across different evaluator models.
Requirements
- A product with a description
- At least one specification of type POLICY with a test type assigned (Accuracy, Security, or Behavior)
How to Generate Metrics
There are two ways to trigger AI metric generation from the dashboard:From the Specifications Page
- Navigate to your product’s Specifications tab
- Open the dropdown menu on the specification you want to generate metrics for
- Click Generate Metrics — this takes you to the generation page with that specification pre-selected
- Click Generate and wait for the AI to process
- Review the generated candidates — edit, save, or discard each one
From the Metrics Page
- Navigate to your product’s Metrics tab
- Click Generate Metrics with AI
- Select the specifications you want to generate metrics for
- Click Generate and wait for the AI to process
- Review the generated candidates — edit, save, or discard each one
Evaluation Parameter Selection
The AI automatically selects the evaluation parameters each metric needs based on what the specification describes. For example:- A specification about citation accuracy or knowledge-grounded answers will include
retrieval_contextso the judge can verify answers against retrieved source material. - A specification about internal processes, workflows, or tool orchestration will include
traces(and oftentools_used) so the judge can inspect the execution path — not just the final output. - A specification about refusal or safety boundaries typically only needs
inputandactual_output, since compliance is fully observable from what was asked and answered.
Generated Metric Properties
Each AI-generated metric candidate includes:| Property | Description |
|---|---|
| Name | A descriptive name for the metric |
| Description | What the metric evaluates |
| Judge Prompt | The evaluation prompt with placeholders for dynamic data |
| Evaluation Parameters | The data parameters the judge needs for evaluation (automatically selected based on the specification) |
| Tags | Categorization tags |
| Evaluator Model | The LLM model used for evaluation |
| Test Type | Inherited from the source specification |