Skip to main content

Returns

Returns a Metric object for the given parameters, or None if an error occurs.

Examples

metric = galtea.metrics.create(
    name="accuracy_v1_" + run_identifier,
    evaluator_model_name="GPT-4.1",
    source="partial_prompt",
    judge_prompt="Determine whether the actual output is equivalent to the expected output",
    evaluation_params=["input", "actual_output", "expected_output"],
    tags=["custom", "accuracy"],
    description="A custom accuracy metric.",
)

Parameters

name
string
required
The name of the metric.
test_type
string
Deprecated. This parameter is ignored and will be removed in a future release.
evaluator_model_name
string
The name of the model used to evaluate the metric. Required for metrics using judge_prompt.Available models:
  • "Claude-Sonnet-4.0"
  • "Claude-Sonnet-3.7"
  • "GPT-4.1-mini"
  • "Gemini-2.5-Flash-Lite"
  • "Gemini-2.5-Flash"
  • "Gemini-2.0-flash"
  • "GPT-4o"
  • "GPT-4.1"
It should not be provided if the metric is “self hosted” (has no judge_prompt) since it does not require a model for evaluation.
judge_prompt
string
A custom prompt that defines the evaluation logic for the metric. For AI Evaluation metrics, write the evaluation criteria and scoring rubric — Galtea will prepend the selected evaluation_params automatically. For Human Evaluation metrics, this serves as the annotation rubric. If omitted, the metric is considered a deterministic “Custom Score” metric.
source
string
The evaluation method for the metric. Possible values:
  • "partial_prompt"AI Evaluation: You provide the core evaluation criteria and rubric. Galtea dynamically constructs the final prompt by prepending selected evaluation parameters to your criteria.
  • "human_evaluation"Human Evaluation: Human annotators manually review and score evaluations using the annotation criteria you define. Evaluations enter a PENDING_HUMAN status and are completed when an annotator submits a score.
  • "self_hosted"Self-Hosted: For deterministic metrics scored locally using the SDK’s CustomScoreEvaluationMetric. Your custom logic runs on your infrastructure, and the resulting score is uploaded to the platform.
evaluation_params
list[string]
Evaluation parameters to include in the judge prompt. These parameters are prepended to the judge prompt to construct the final evaluation prompt. To check the available evaluation parameters, see the Evaluation Parameters section.
Only applicable for AI Evaluation and Human Evaluation metrics.
user_group_ids
list[string]
A list of user group IDs to associate with this metric. Only applicable when source is human_evaluation.
  • If user group IDs are specified, only users in those groups can annotate evaluations for this metric.
  • If no user group IDs are specified, any user in the organization can annotate.
tags
list[string]
Tags to categorize the metric.
description
string
A brief description of what the metric evaluates.
documentation_url
string
A URL pointing to more detailed documentation about the metric.