Skip to main content
Galtea’s Conversation Simulator allows you to test your AI products — chatbots, assistants, and agents — by simulating realistic multi-turn user interactions. This guide walks you through integrating your agent and running simulations using Behavior Tests.
Using the specification-driven workflow? If you’ve defined Specifications with linked metrics and tests, the evaluation pipeline handles simulation automatically — no manual session creation needed. See Evaluations from Specs.

Agent Integration Options

The quickest way to get started. Your function receives just the latest user message as a string.
def my_agent(user_message: str) -> str:
    # In a real scenario, call your model here
    return f"Your model output to: {user_message}"
All three signatures work with evaluations.run(), inference_results.generate(), and simulator.simulate(). Both sync and async functions are supported. The SDK auto-detects which signature you’re using from the type hint on the first parameter.

Conversation Simulation Workflow

1

Implement Your Agent

Define an agent function with one of the supported signatures above.
2

Prepare Scenario Data

Create a CSV file with scenario data, or generate test cases from your specifications. Each row describes a user goal, persona, scenario, and the first user message.
3

Create a Test and Sessions

Upload your scenario CSV to create a test, or use AI-generated tests from the dashboard. The platform generates a session for each test case.
4

Run the Simulator with Your Agent

Use simulator.simulate() to execute the conversation between your agent and the simulated user for each session.
5

Evaluate the Results

After simulation, trigger evaluations via evaluations.create() or review results in the dashboard’s Analytics tab.

Step-by-Step Guide

1. Create a Test and Sessions

First, create behavior test cases with user personas and goals. You can generate these from your product description or upload a CSV:
# Create a test suite using the behavior test options
# This can be done via the Dashboard or programmatically as shown here
test = galtea_client.tests.create(
    product_id=product.id,
    name=test_name,
    type="BEHAVIOR",
    # This time we provide a path to a CSV file with behavior tests, but you can also have Galtea generate them if you do not provide a CSV file
    test_file_path="path/to/behavior_test.csv",
)

# Get your test cases
# If Galtea is generating the test for you, it might take a few moments to be ready
test_cases = galtea_client.test_cases.list(test_id=test.id)
Once generation completes, you’ll see the resulting test cases in your dashboard. For the CSV upload format, see Behavior Tests.

2. Run the Conversation Simulator

For each test case/session, use the simulator to run the full simulation with your agent function:
# Define your agent function (see Agent Integration Options for all signatures)
def my_agent(user_message: str) -> str:
    return f"Response to: {user_message}"


# Run simulations with your agent function
for test_case in test_cases:
    session = galtea_client.sessions.create(
        version_id=version.id, test_case_id=test_case.id
    )

    result = galtea_client.simulator.simulate(
        session_id=session.id, agent=my_agent, max_turns=10
    )

    # Analyze results
    print(f"Scenario: {test_case.scenario}")
    print(f"Completed {result.total_turns} turns")
    print(f"Success: {result.finished}")
    if result.stopping_reason:
        print(f"Ended because: {result.stopping_reason}")
You can optionally use the @trace decorator to capture internal operations during simulation. Traces are automatically collected and saved per turn. See the Tracing Agent Operations guide for more details.

3. Evaluate the Session

    # After each simulation, you can create an evaluation
    evaluations = galtea_client.evaluations.create(
        session_id=session.id,
        metrics=[{"name": "Role Adherence"}],  # Replace with your metrics
    )
    for evaluation in evaluations:
        print(f"Evaluation created: {evaluation.id}")

Advanced Usage: RAG Agents with Retrieval Context

For Retrieval-Augmented Generation (RAG) agents, you can return the context that was retrieved and used to generate the response. This context will be logged with the inference result, enabling evaluations with metrics like Faithfulness and Contextual Relevancy.
def my_rag_agent(input_data: galtea.AgentInput) -> galtea.AgentResponse:
    user_message = input_data.last_user_message_str()

    # Your RAG logic to retrieve context and generate a response
    retrieved_docs = vector_store.search(user_message)
    response_content = llm.generate(prompt=user_message, context=retrieved_docs)

    return galtea.AgentResponse(
        content=response_content,
        retrieval_context=retrieved_docs,
        metadata={"docs_retrieved": len(retrieved_docs)},
    )
The retrieval_context field is optional and can contain:
  • Retrieved document snippets or full documents
  • Formatted context strings
  • JSON-serializable data structures
By providing retrieval context, you enable Galtea to evaluate the faithfulness of your model’s responses relative to the retrieved information.

Next Steps

Specification-Driven Evaluations

Automate simulation + evaluation from specifications with evaluations.run().

Tracing Agent Operations

Capture internal operations of your AI agent during simulations.