The TextToImageMetric is a multimodal metric designed to evaluate the quality of images generated from text prompts. It assesses how well the generated image matches the expected output, serving as a proxy for evaluating the performance of text-to-image models.

Required Arguments

  • input: A list of text prompts that describe the desired image content.
  • actual_output: A list of MLLMImage instances representing the actual images generated by the model. Each MLLMImage requires:
  • url: The file path or URL to the image.

Optional Arguments

  • threshold: A float representing the minimum passing threshold, defaulted to 0.5.
  • model: A string specifying which of OpenAI’s GPT models to use, or any custom LLM model of type DeepEvalBaseLLM. Defaulted to ‘gpt-4o’.
  • include_reason: A boolean which, when set to True, includes a reason for its evaluation score. Defaulted to True.
  • strict_mode: A boolean which, when set to True, enforces a binary metric score: 1 for perfection, 0 otherwise. It also overrides the current threshold and sets it to 1. Defaulted to False.
  • async_mode: A boolean which, when set to True, enables concurrent execution within the measure() method. Defaulted to True.
  • verbose_mode: A boolean which, when set to True, prints the intermediate steps used to calculate the metric to the console. Defaulted to False.

Usage Example

import sys
import os

from agensight.eval.metrics import TextToImageMetric
from agensight.eval.test_case import MLLMTestCase
from agensight.eval.test_case import MLLMImage

# Dummy image and prompt for demonstration
prompt = ["A cat sitting on a windowsill."]
actual_output = [MLLMImage(url="image-path/cat.jpeg")]

metric = TextToImageMetric(model="gpt-4o", threshold=0.5)
test_case = MLLMTestCase(input=prompt, actual_output=actual_output)

metric.measure(test_case)
print(f"Score: {metric.score}")
print(f"Reason: {metric.reason}")