Multimodal Answer Relevancy
The MultimodalAnswerRelevancyMetric is a multimodal metric designed to evaluate the relevancy of answers generated in response to questions about images. It assesses how well the generated answer aligns with the content of the image, serving as a proxy for evaluating the performance of models that generate text based on visual inputs.
Required Arguments
-
input: A list containing the question related to the image.
-
actual_output: A list containing the generated answer or description.
Optional Arguments
-
threshold: A float representing the minimum passing threshold, defaulted to 0.5.
-
model: A string specifying which of OpenAI’s GPT models to use, or any custom LLM model of type DeepEvalBaseLLM. Defaulted to ‘gpt-4o’.
-
include_reason: A boolean which, when set to True, includes a reason for its evaluation score. Defaulted to True.
-
strict_mode: A boolean which, when set to True, enforces a binary metric score: 1 for perfection, 0 otherwise. It also overrides the current threshold and sets it to 1. Defaulted to False.
-
async_mode: A boolean which, when set to True, enables concurrent execution within the measure() method. Defaulted to True.
-
verbose_mode: A boolean which, when set to True, prints the intermediate steps used to calculate the metric to the console. Defaulted to False.