Create Inference Result

Returns

Example

inference_result = galtea.inference_results.create(
    session_id=session_id,
    input="What is the capital of France?",
    output="Paris",
    latency=150.5,
)

Parameters

string

The session ID to log the inference result to.

Instead of a session_id, you can reference an existing session by its own conversation id (see the section below). Provide exactly one of the two.

Log to an existing session by your own conversation ID

If you already created the session via the sessions API with your own custom_id, reference it here with session_custom_id plus a version_id or product_id to append inference results to it. If no matching session exists, the call returns a 400 error — create it first via the sessions API.

string

Your own conversation id, as an alternative to session_id. Galtea finds an existing session with this id (per version); the session is not created here. Requires version_id or product_id.

string

The version the session is anchored to when using session_custom_id.

Provide version_id or product_id (at least one) when using session_custom_id. If both are given, version_id wins and product_id is ignored.

string

Product anchor used when no version_id is given with session_custom_id: Galtea finds the session with this custom id under the product (across its versions).

If no session matches the session_custom_id and anchor, the call returns a 400 error. Create the session first via the sessions API.

string

The input text/prompt.

string

The generated output/response.

string

Context retrieved for RAG systems.

When you set retrieval_context, Galtea also records it as a RETRIEVER trace on this inference result.

float

Latency in milliseconds.

dict[str, int]

Information about token usage during the model call. Possible keys include:

input_tokens: Number of input tokens sent to the model.
output_tokens: Number of output tokens generated by the model.
cache_read_input_tokens: Number of input tokens read from the cache.

dict[str, float]

Information about the cost per token during the model call. Possible keys include:

cost_per_input_token: Cost per input token sent to the model.
cost_per_output_token: Cost per output token generated by the model.
cost_per_cache_read_input_token: Cost per input token read from the cache.

string

The version of Galtea’s conversation simulator used to generate the user message (input). This should only be provided when logging a conversation that was generated using the simulator.

str | InferenceResultStatus

The initial status of the inference result. Accepts a case-insensitive string or an InferenceResultStatus enum value. Valid values: PENDING, GENERATED, FAILED, SKIPPED. Other string values are sent to the API unchanged, so values added in newer API versions can be used; the API rejects invalid ones.The two typical SDK flows are:

Omit (default) to create the inference result as GENERATED — the right choice when the output is already known at create time.
Pass PENDING to log the inference result before the model call completes, then transition to GENERATED or FAILED via update() once the call resolves.

Passing FAILED or SKIPPED directly is supported for callers that want to log a terminal-state IR explicitly (e.g. recording a known-bad turn or a skipped step).

Introduction

SDK

CLI

Concepts

Create Inference Result

Returns

Example

Parameters

​Returns

​Example

​Parameters

Returns

Example

Parameters