Conversation Completeness

The Conversation Completeness metric is one of several non-deterministic Metric Galtea uses to measure the ability of your LLM-based chatbot to guide a user through an end-to-end conversation that successfully satisfies their initial request or goal. It evaluates whether the dialogue includes all necessary steps and logical transitions needed to fulfill the user’s intent. This metric is especially relevant for transactional or task-based agents (e.g., booking systems, support bots).

Evaluation Parameters

To compute the conversation_completeness metric, the following parameters are required in every turn of the conversation:

input: The user message in the session.
actual_output: The chatbot’s response to the user message.

This metric will evaluate the whole conversation, including all turns, to judge whether the full task was completed or abandoned partway.

How Is It Calculated?

The conversation_completeness score is computed using an LLM-as-a-judge approach:

Identify User Intent: The LLM determines the user’s primary goal or request (e.g., “Book a flight”, “Troubleshoot a bug”, “Get a refund”).
Analyze Progression: The LLM checks if the agent asked necessary follow-up questions, gathered required information, or performed the necessary logical steps.
Verify Resolution: The LLM determines if the agent provided a final solution, if the user confirmed satisfaction, or if the conversation ended abruptly or in a loop.

The metric assigns a binary score:

Score 1.0 (Complete): The user’s intent was fully satisfied and the conversation reached a logical conclusion. A closure message is not required if the agent provided a solution.
Score 0.0 (Incomplete): The conversation was abandoned, the agent failed to provide a solution, or the user’s goal remains unfulfilled.

Suggested Test Case Types

The Conversation Completeness metric is effective for evaluating Behavior test cases in Galtea, particularly:

Task completion flows where the user has a clear objective (e.g., booking, purchasing, troubleshooting).
Multi-step processes that require gathering information across several turns.
Support interactions where the conversation must reach a resolution.

​Evaluation Parameters

​How Is It Calculated?

​Suggested Test Case Types

Evaluation Parameters

How Is It Calculated?

Suggested Test Case Types