Skip to main content

Documentation Index

Fetch the complete documentation index at: https://pype-db52d533.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

The ConversationalGEval metric is designed to evaluate AI assistants during technical support conversations. It ensures that the assistant maintains a professional tone and provides clear, constructive responses.

Required Arguments

  • name: The name of the metric.
  • criteria: A description outlining the specific evaluation aspects for each test case.
  • evaluation_params: A list of type ModelTestCaseParams. Include only the parameters that are relevant for evaluation.

Optional Arguments

  • evaluation_steps: A list of strings outlining the exact steps the LLM should take for evaluation. You can only provide either evaluation_steps OR criteria, and not both.

Usage Example

from agensight.eval.test_case import ModelTestCase, ModelTestCaseParams, ConversationalTestCase
from agensight.eval.metrics import ConversationalGEval

convo_test_case = ConversationalTestCase(
    turns=[
        ModelTestCase(
            input="I need help debugging my Python code that's throwing a TypeError",
            actual_output="I'll help you debug that TypeError. Could you please share the error message and the relevant code snippet?"
        ),
        ModelTestCase(
            input="Here's the error: TypeError: unsupported operand type(s) for +: 'int' and 'str'",
            actual_output="I see the issue. This error occurs when trying to concatenate an integer and a string. Let's fix this by converting the integer to a string using str() or using string formatting."
        )
    ]
)

professionalism_metric = ConversationalGEval(
    name="Technical Support Professionalism",
    criteria="""Evaluate whether the AI assistant maintains professional and helpful communication
    while providing technical support, ensuring clear and constructive responses.""",
    evaluation_steps=[
        "Check if responses are clear, concise, and focused on solving the technical problem",
        "Verify that the assistant asks for necessary information in a polite manner",
        "Ensure explanations are technical yet accessible, avoiding unnecessary jargon",
        "Confirm that the assistant maintains a helpful and patient tone throughout the conversation"
    ],
    evaluation_params=[ModelTestCaseParams.INPUT, ModelTestCaseParams.ACTUAL_OUTPUT],
)

professionalism_metric.measure(convo_test_case)
print(professionalism_metric.score, professionalism_metric.reason)