Evaluation Parameters
To compute theconversation_completeness metric, the following parameters are required in every turn of the conversation:
input: The user message in the session.actual_output: The chatbot’s response to the user message.
How Is It Calculated?
Theconversation_completeness score is computed using an LLM-as-a-judge approach:
- Identify User Intent: The LLM determines the user’s primary goal or request (e.g., “Book a flight”, “Troubleshoot a bug”, “Get a refund”).
- Analyze Progression: The LLM checks if the agent asked necessary follow-up questions, gathered required information, or performed the necessary logical steps.
- Verify Resolution: The LLM determines if the agent provided a final solution, if the user confirmed satisfaction, or if the conversation ended abruptly or in a loop.
- Score 1.0 (Complete): The user’s intent was fully satisfied and the conversation reached a logical conclusion. A closure message is not required if the agent provided a solution.
- Score 0.0 (Incomplete): The conversation was abandoned, the agent failed to provide a solution, or the user’s goal remains unfulfilled.
Suggested Test Case Types
The Conversation Completeness metric is effective for evaluating Behavior test cases in Galtea, particularly:- Task completion flows where the user has a clear objective (e.g., booking, purchasing, troubleshooting).
- Multi-step processes that require gathering information across several turns.
- Support interactions where the conversation must reach a resolution.