Create Metric - Galtea Docs

Returns

Returns a Metric object for the given parameters, or None if an error occurs.

Examples

AI Evaluation
Human Evaluation
Self-Hosted

metric = galtea.metrics.create(
    name="accuracy_v1_" + run_identifier,
    evaluator_model_name="GPT-4.1",
    source="partial_prompt",
    judge_prompt="Determine whether the actual output is equivalent to the expected output",
    evaluation_params=["input", "actual_output", "expected_output"],
    tags=["custom", "accuracy"],
    description="A custom accuracy metric.",
)

metric = galtea.metrics.create(
    name="human_eval_v1_" + run_identifier,
    source="human_evaluation",
    judge_prompt="Review the actual output and score it based on helpfulness, accuracy, and tone. Score 1 if the response is helpful and accurate. Score 0 if it is unhelpful or incorrect.",
    evaluation_params=["input", "actual_output", "expected_output"],
    user_group_ids=[user_group_id],
    tags=["human", "quality"],
    description="A human evaluation metric scored by QA annotators.",
)

metric = galtea.metrics.create(
    name="accuracy_v1_self_" + run_identifier,
    source="self_hosted",
    tags=["custom", "accuracy"],
    description="A custom accuracy metric calculated by our custom score function.",
)

Parameters

name

string

required

The name of the metric.

test_type

string

Deprecated. This parameter is ignored and will be removed in a future release.

evaluator_model_name

string

The name of the model used to evaluate the metric. Required for metrics using judge_prompt.Available models:

"Claude-Sonnet-4.0"
"Claude-Sonnet-3.7"
"GPT-4.1-mini"
"Gemini-2.5-Flash-Lite"
"Gemini-2.5-Flash"
"Gemini-2.0-flash"
"GPT-4o"
"GPT-4.1"

It should not be provided if the metric is “self hosted” (has no judge_prompt) since it does not require a model for evaluation.

judge_prompt

string

A custom prompt that defines the evaluation logic for the metric. For AI Evaluation metrics, write the evaluation criteria and scoring rubric — Galtea will prepend the selected evaluation_params automatically. For Human Evaluation metrics, this serves as the annotation rubric. If omitted, the metric is considered a deterministic “Custom Score” metric.

source

string

The evaluation method for the metric. Possible values:

"partial_prompt" — AI Evaluation: You provide the core evaluation criteria and rubric. Galtea dynamically constructs the final prompt by prepending selected evaluation parameters to your criteria.
"human_evaluation" — Human Evaluation: Human annotators manually review and score evaluations using the annotation criteria you define. Evaluations enter a PENDING_HUMAN status and are completed when an annotator submits a score.
"self_hosted" — Self-Hosted: For deterministic metrics scored locally using the SDK’s CustomScoreEvaluationMetric. Your custom logic runs on your infrastructure, and the resulting score is uploaded to the platform.

evaluation_params

list[string]

Evaluation parameters to include in the judge prompt. These parameters are prepended to the judge prompt to construct the final evaluation prompt. To check the available evaluation parameters, see the Evaluation Parameters section.

Only applicable for AI Evaluation and Human Evaluation metrics.

user_group_ids

list[string]

A list of user group IDs to associate with this metric. Only applicable when source is human_evaluation.

If user group IDs are specified, only users in those groups can annotate evaluations for this metric.
If no user group IDs are specified, any user in the organization can annotate.

​Returns

​Examples

​Parameters

Returns

Examples

Parameters