Using specifications? If your tests and metrics are linked to Specifications, you can use Specification-Based Evaluation below — it resolves metrics automatically. This tutorial also covers the manual loop approach for full control over each test case.
A session is created implicitly the first time you run an evaluation for a specific
version_id and test_id combination. You do not need to create it manually.Workflow
Iterate Through Test Cases
Fetch all test cases associated with the test using
galtea.test_cases.list().Example
This example demonstrates how to run an evaluation on all test cases from a specific test.A session is automatically created behind the scenes to link this
version_id and test_id with the provided inference result (the actual_output and the Test Case’s input).Specification-Based Evaluation
Instead of passing metrics explicitly, you can evaluate against Specifications. Each specification has linked metrics that the API resolves automatically.If you omit both
metrics and specification_ids, the API falls back to all metrics from every specification linked to the product.Next Steps
Specification-Driven Evaluations
Automate test resolution, agent execution, and evaluation with specifications.
Evaluating Conversations
Evaluate multi-turn conversations and production sessions.