The ConversationalGEval metric is designed to evaluate AI assistants during technical support conversations. It ensures that the assistant maintains a professional tone and provides clear, constructive responses.

Required Arguments

  • name: The name of the metric.
  • criteria: A description outlining the specific evaluation aspects for each test case.
  • evaluation_params: A list of type ModelTestCaseParams. Include only the parameters that are relevant for evaluation.

Optional Arguments

  • evaluation_steps: A list of strings outlining the exact steps the LLM should take for evaluation. You can only provide either evaluation_steps OR criteria, and not both.

Usage Example

from agensight.eval.test_case import ModelTestCase, ModelTestCaseParams, ConversationalTestCase
from agensight.eval.metrics import ConversationalGEval

convo_test_case = ConversationalTestCase(
    turns=[
        ModelTestCase(
            input="I need help debugging my Python code that's throwing a TypeError",
            actual_output="I'll help you debug that TypeError. Could you please share the error message and the relevant code snippet?"
        ),
        ModelTestCase(
            input="Here's the error: TypeError: unsupported operand type(s) for +: 'int' and 'str'",
            actual_output="I see the issue. This error occurs when trying to concatenate an integer and a string. Let's fix this by converting the integer to a string using str() or using string formatting."
        )
    ]
)

professionalism_metric = ConversationalGEval(
    name="Technical Support Professionalism",
    criteria="""Evaluate whether the AI assistant maintains professional and helpful communication
    while providing technical support, ensuring clear and constructive responses.""",
    evaluation_steps=[
        "Check if responses are clear, concise, and focused on solving the technical problem",
        "Verify that the assistant asks for necessary information in a polite manner",
        "Ensure explanations are technical yet accessible, avoiding unnecessary jargon",
        "Confirm that the assistant maintains a helpful and patient tone throughout the conversation"
    ],
    evaluation_params=[ModelTestCaseParams.INPUT, ModelTestCaseParams.ACTUAL_OUTPUT],
)

professionalism_metric.measure(convo_test_case)
print(professionalism_metric.score, professionalism_metric.reason)