The ImageReferenceMetric is a multimodal metric designed to evaluate the relevance of images in relation to the accompanying text. It assesses how well the image serves as a reference to the text, serving as a proxy for evaluating the performance of models that generate or select images to support textual content.
threshold: A float representing the minimum passing threshold, defaulted to 0.5.
model: A string specifying which of OpenAI’s GPT models to use, or any custom LLM model of type DeepEvalBaseLLM. Defaulted to ‘gpt-4o’.
include_reason: A boolean which, when set to True, includes a reason for its evaluation score. Defaulted to True.
strict_mode: A boolean which, when set to True, enforces a binary metric score: 1 for perfection, 0 otherwise. It also overrides the current threshold and sets it to 1. Defaulted to False.
async_mode: A boolean which, when set to True, enables concurrent execution within the measure() method. Defaulted to True.
verbose_mode: A boolean which, when set to True, prints the intermediate steps used to calculate the metric to the console. Defaulted to False.
import sysimport osfrom agensight.eval.metrics import ImageReferenceMetricfrom agensight.eval.test_case import MLLMTestCasefrom agensight.eval.test_case import MLLMImageinput_data = ["This is a report with an image."]actual_output = ["The following image shows a cat on a windowsill.", MLLMImage(url="/image-path/cat.jpeg")]metric = ImageReferenceMetric(model="gpt-4o", threshold=0.5)test_case = MLLMTestCase(input=input_data, actual_output=actual_output)metric.measure(test_case)print(f"Score: {metric.score}")print(f"Reason: {metric.reason}")