The TaskCompletionMetric is designed to evaluate how effectively an AI model completes a specified task. It assesses the alignment between the task’s requirements and the model’s output, considering any tools used during the process. This metric is ideal for scenarios where task success is critical.
Here’s how you can use the TaskCompletionMetric in your evaluation system:
Copy
from agensight.eval.metrics import TaskCompletionMetricfrom agensight.eval.test_case import ModelTestCase, ToolCall# Define the metrictask_metric = TaskCompletionMetric( threshold=0.7, model="gpt-4o-mini", include_reason=True)# Create a test casetest_case = ModelTestCase( input="Develop a Python script to automate data entry tasks.", actual_output="The script automates data entry using pandas and openpyxl.", tools_called=[ ToolCall( name="DataEntryBot", description="Automates data entry tasks using Python libraries.", input_parameters={"library": "pandas", "task": "data entry"}, output=["Data entry automated using pandas and openpyxl."] ) ])# Run the evaluationtask_metric.measure(test_case)print(task_metric.score, task_metric.reason)