> ## Documentation Index
> Fetch the complete documentation index at: https://pype-db52d533.mintlify.site/llms.txt
> Use this file to discover all available pages before exploring further.

# Multimodal Tool Correctness

The MultimodalToolCorrectnessMetric is a multimodal metric designed to evaluate the correctness of tool usage in generating outputs. It assesses how accurately the tools were used in the process of generating the output, serving as a proxy for evaluating the performance of models that rely on tool calls to generate text or images.

### **Required Arguments**

* input: A list containing the prompt or task description.

* actual\_output: A list containing the generated output or description.

* tools\_called: A list of ToolCall instances representing the tools that were actually called. Each ToolCall requires:

* name: The name of the tool called.

* expected\_tools: A list of ToolCall instances representing the tools that were expected to be called. Each ToolCall requires:

* name: The name of the expected tool.

### **Optional Arguments**

* threshold: A float representing the minimum passing threshold, defaulted to 0.5.

* model: A string specifying which of OpenAI's GPT models to use, or any custom LLM model of type DeepEvalBaseLLM. Defaulted to 'gpt-4o'.

* include\_reason: A boolean which, when set to True, includes a reason for its evaluation score. Defaulted to True.

* strict\_mode: A boolean which, when set to True, enforces a binary metric score: 1 for perfection, 0 otherwise. It also overrides the current threshold and sets it to 1. Defaulted to False.

* async\_mode: A boolean which, when set to True, enables concurrent execution within the measure() method. Defaulted to True.

* verbose\_mode: A boolean which, when set to True, prints the intermediate steps used to calculate the metric to the console. Defaulted to False.

### Usage Example

```python theme={null}
import sys
import os

from agensight.eval.metrics import MultimodalToolCorrectnessMetric
from agensight.eval.test_case import ModelTestCase, ToolCall
from agensight.eval.test_case import MLLMImage

input_data = ["Draw a cat and then describe it."]
actual_output = ["Image generated and described."]
tools_called = [ToolCall(name="draw_image"), ToolCall(name="describe_image")]
expected_tools = [ToolCall(name="s"), ToolCall(name="w")]

metric = MultimodalToolCorrectnessMetric()
test_case = ModelTestCase(input=input_data, actual_output=actual_output, tools_called=tools_called, expected_tools=expected_tools)

metric.measure(test_case)
print(f"Score: {metric.score}")
print(f"Reason: {metric.reason}") 
```
