> ## Documentation Index
> Fetch the complete documentation index at: https://pype-db52d533.mintlify.site/llms.txt
> Use this file to discover all available pages before exploring further.

# Tool Correctness 

The ToolCorrectnessMetric is designed to evaluate the accuracy of an AI model's tool usage. It assesses whether the tools called by the model match the expected tools, considering factors like input parameters and output. This metric is crucial for ensuring that models use the correct tools to achieve desired outcomes.

### **Required Arguments**

To use the ToolCorrectnessMetric, you need to provide the following arguments when creating a ModelTestCase:

* **input**: The task or goal the user wants the model to perform.
* **actual\_output**: The output generated by the model.
* **tools\_called**: The tools or actions the model used to accomplish the task.
* **expected\_tools**: The tools that are expected to be used by the model.

### **Optional Parameters**

* **threshold**: A float representing the minimum passing threshold, defaulted to 0.5.
* **evaluation\_params**: A list of ToolCallParams indicating the strictness of the correctness criteria. Options include ToolCallParams.INPUT\_PARAMETERS and ToolCallParams.OUTPUT.
* **include\_reason**: A boolean indicating whether to include a reason for the evaluation score. Defaulted to True.
* **strict\_mode**: Enforces a binary metric score: 1 for perfection, 0 otherwise. Overrides the current threshold and sets it to 1. Defaulted to False.
* **verbose\_mode**: Prints intermediate steps used to calculate the metric to the console. Defaulted to False.
* **should\_consider\_ordering**: Considers the order in which tools were called. Defaulted to False.
* **should\_exact\_match**: Requires the tools\_called and expected\_tools to be exactly the same. Defaulted to False.

### **Usage Example**

Here's how you can use the ToolCorrectnessMetric in your evaluation system:

```python theme={null}
from agensight.eval.metrics import ToolCorrectnessMetric
from agensight.eval.test_case import ModelTestCase, ToolCall

# Define the metric
tool_metric = ToolCorrectnessMetric(
    threshold=0.7,
    include_reason=True
)

# Create a test case
test_case = ModelTestCase(
    input="What if these shoes don't fit?",
    actual_output="We offer a 30-day full refund at no extra cost.",
    tools_called=[
        ToolCall(name="WebSearch"),
        ToolCall(name="ToolQuery")
    ],
    expected_tools=[
        ToolCall(name="WebSearch")
    ]
)

# Run the evaluation
tool_metric.measure(test_case)
print(tool_metric.score, tool_metric.reason)
```
