What is a Specification?
A Specification in Galtea represents a single, testable behavioral expectation for a product. Examples include “Must decline political questions”, “Can answer questions about phone specifications”, or “Always includes a disclaimer when giving financial advice”. Each specification has a type that classifies the kind of behavioral expectation:- Capability: A core function the product can perform — what the product is designed to accomplish or deliver. For example: “Can classify images as containing cats or dogs”, “Can retrieve and display the user’s account balance” (conversational), or “Can generate personalized product recommendations based on browsing history”.
- Inability: An action or task the product is fundamentally unable to do even if a malicious actor gained full control over it. These represent hard technical constraints where the infrastructure, system access, or architecture simply does not exist. For example: “Cannot execute wire transfers” (no payment gateway connected) or “Cannot access other customers’ data” (no system integration exists).
- Policy: A mandatory rule or guideline the product must follow — from restrictions to behavioral guidelines. This includes refusal rules (what it must decline), interaction patterns (how it responds), communication style, and mandatory disclaimers. For example: “Must include a disclaimer when discussing medical topics” or “Refuses requests to share confidential business data”.
ACCURACY, SECURITY, or BEHAVIOR.
Specifications replace the legacy free-text fields (Capabilities, Inabilities, Policies) on Products with structured, individually testable expectations linked to specific metrics.
The Specification-Driven Workflow
Specifications are at the center of Galtea’s recommended evaluation flow:- Define specifications for your product (Capability, Inability, Policy)
- Generate metrics from specifications using AI Metric Generation
- Create tests from specifications — the test type is auto-derived
- Run evaluations with
evaluations.run()— resolves specs, tests, and metrics automatically
Specification-Driven Evaluations Tutorial
End-to-end guide: define specs, generate metrics, create tests, and run evaluations — all from specifications.
SDK Integration
The SDK allows you to create, list, retrieve, and delete specifications, as well as link and unlink metrics. See the Specification Service API documentation for more details.Specification Service
Manage specifications programmatically
Specification Properties
Unique identifier of the specification.
The ID of the product this specification belongs to.
A description of the testable behavioral expectation.
Example: “Must decline political questions and redirect the user to authoritative sources.”
The type of specification.
Possible values:
CAPABILITY, INABILITY, POLICY.The type of test for this specification.
Possible values:
ACCURACY, SECURITY, BEHAVIOR.
Required for
POLICY specifications. Must not be provided for CAPABILITY or INABILITY specifications.Variant of the test type. Applicable for
ACCURACY and SECURITY test types.List of metric IDs linked to this specification.
Timestamp of when the specification was created (ISO 8601 format).
Related
Product
A functionality or service being evaluated
Metric
Ways to evaluate and score product performance