Returns
Returns an InferenceResult object.Example
Parameters
The session ID to log the inference result to.
The input text/prompt.
The generated output/response.
Context retrieved for RAG systems.
Latency in milliseconds.
Information about token usage during the model call.
Possible keys include:
input_tokens: Number of input tokens sent to the model.output_tokens: Number of output tokens generated by the model.cache_read_input_tokens: Number of input tokens read from the cache.
Information about the cost per token during the model call.
Possible keys include:
cost_per_input_token: Cost per input token sent to the model.cost_per_output_token: Cost per output token generated by the model.cost_per_cache_read_input_token: Cost per input token read from the cache.
The version of Galtea’s conversation simulator used to generate the user message (input). This should only be provided when logging a conversation that was generated using the simulator.
The initial status of the inference result. Accepts a case-insensitive string or an
InferenceResultStatus enum value. Valid values: PENDING, GENERATED, FAILED, SKIPPED.The two typical SDK flows are:- Omit (default) to create the inference result as
GENERATED— the right choice when the output is already known at create time. - Pass
PENDINGto log the inference result before the model call completes, then transition toGENERATEDorFAILEDviaupdate()once the call resolves.
FAILED or SKIPPED directly is supported for callers that want to log a terminal-state IR explicitly (e.g. recording a known-bad turn or a skipped step).