Skip to main content

Returns

Returns an InferenceResult object.

Example

inference_result = galtea.inference_results.create(
    session_id=session_id,
    input="What is the capital of France?",
    output="Paris",
    latency=150.5,
)

Parameters

session_id
string
required
The session ID to log the inference result to.
input
string
The input text/prompt.
output
string
The generated output/response.
retrieval_context
string
Context retrieved for RAG systems.
latency
float
Latency in milliseconds.
usage_info
dict[str, int]
Information about token usage during the model call. Possible keys include:
  • input_tokens: Number of input tokens sent to the model.
  • output_tokens: Number of output tokens generated by the model.
  • cache_read_input_tokens: Number of input tokens read from the cache.
cost_info
dict[str, float]
Information about the cost per token during the model call. Possible keys include:
  • cost_per_input_token: Cost per input token sent to the model.
  • cost_per_output_token: Cost per output token generated by the model.
  • cost_per_cache_read_input_token: Cost per input token read from the cache.
conversation_simulator_version
string
The version of Galtea’s conversation simulator used to generate the user message (input). This should only be provided when logging a conversation that was generated using the simulator.
status
str | InferenceResultStatus
The initial status of the inference result. Accepts a case-insensitive string or an InferenceResultStatus enum value. Valid values: PENDING, GENERATED, FAILED, SKIPPED.The two typical SDK flows are:
  • Omit (default) to create the inference result as GENERATED — the right choice when the output is already known at create time.
  • Pass PENDING to log the inference result before the model call completes, then transition to GENERATED or FAILED via update() once the call resolves.
Passing FAILED or SKIPPED directly is supported for callers that want to log a terminal-state IR explicitly (e.g. recording a known-bad turn or a skipped step).