Evaluation Parameters
To compute therole_adherence metric, the following inputs are required in every turn of the conversation:
input: The current user message.actual_output: The corresponding chatbot response.
How Is It Calculated?
Therole_adherence score is computed using an LLM-as-a-judge approach:
- Define the Persona: Based on the
product_description, the LLM identifies the expected persona, tone, professional boundaries, and style. - Audit the Conversation: The LLM reviews every response from the agent in the conversation history.
- Check for Deviations: The LLM evaluates whether the agent broke character, violated tone constraints, strayed from its designated responsibilities, or suddenly deviated from the inferred role.
- Score 1.0 (Adherent): The agent consistently maintained its role, tone, and persona throughout all turns.
- Score 0.0 (Non-Adherent): The agent deviated from its role, broke character, or adopted an inconsistent tone at any point.
Suggested Test Case Types
The Role Adherence metric is effective for evaluating Behavior test cases in Galtea, particularly:- Persona-driven conversations where the agent has a defined character or role.
- Adversarial prompts that attempt to make the agent break character.
- Enterprise scenarios where consistent professional tone and scope adherence are required.