Skip to main content

Required Arguments

Each test case must be a ModelTestCase instance with the following fields:
  • input: The user’s query.
  • actual_output: The generated response (used for formatting or logging; not used in score computation).
  • retrieval_context: A list of strings representing the context chunks retrieved from your knowledge base.

Optional Arguments

ArgumentTypeDescriptionDefault
thresholdfloatMinimum passing score.0.5
modelstrName of LLM to use for scoring (e.g., 'gpt-4o' or custom DeepEvalBaseLLM).'gpt-4o'
include_reasonboolIf True, includes a human-readable explanation for the score.True
strict_modeboolEnforces a binary score: 1 for full relevance, 0 otherwise.False
async_modeboolEnables parallel processing during scoring.True
verbose_modeboolIf True, logs detailed scoring steps to the console.False
evaluation_templateContextualRelevancyTemplateOverride prompt logic for LLM judge.Internal default

Usage Example

from agensight.eval.test_case import ModelTestCase
from agensight.eval.metrics import ContextualRelevancyMetric

# Example: Return policy relevance
test_case = ModelTestCase(
    input="What is your return policy?",
    actual_output="You can return items within 30 days if unused.",
    retrieval_context=[
        "Return Policy: Items must be returned within 30 days of purchase",
        "Condition: Products must be unused and in original packaging",
        "Requirements: Original receipt must be presented",
        "Shipping Policy: Free shipping on orders over $50"
    ]
)

# Initialize the metric
metric = ContextualRelevancyMetric(
    threshold=0.7,
    model="gpt-4o",
    include_reason=True
)

# Evaluate
metric.measure(test_case)
print(metric.score)   # e.g. 0.75
print(metric.reason)  # e.g. "Most statements are relevant except for shipping info."

How It Works

  • Extract statements from the retrieval_context using the selected LLM.
  • Classify each statement as either relevant or not relevant to the input.
  • Compute the Contextual Relevancy score as:
Contextual Relevancy=Number of Relevant Statements/Total Statements in ContextContextual Relevancy=Number of Relevant Statements / Total Statements in Context ​ This focuses on precision of retrieval, rewarding concise and input-aligned context chunks.

Use Cases

Use ContextualRelevancyMetric when you want to:
  • Optimize retriever precision by penalizing irrelevant or off-topic context.
  • Measure retrieval drift when irrelevant content is included in context.
  • Improve user satisfaction by keeping retrieved context short and focused.