Documentation Index
Fetch the complete documentation index at: https://pype-db52d533.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Required Arguments
Each test case should be aModelTestCase instance with the following fields:
input: The original user query.actual_output: The LLM-generated response based on retrieved context.expected_output: The ideal response based on the context.retrieval_context: A list of strings representing the retrieved context chunks used by the LLM.
Optional Arguments
| Argument | Type | Description | Default |
|---|---|---|---|
threshold | float | Minimum score to be considered a “pass”. | 0.5 |
model | str | The LLM to use for evaluation (e.g., 'gpt-4o', or a custom LLM). | 'gpt-4o' |
include_reason | bool | If True, includes the reasoning behind the metric score. | True |
strict_mode | bool | Enforces a binary score (1 for perfect relevance order, 0 otherwise). | False |
async_mode | bool | Enables concurrent processing for faster evaluation. | True |
verbose_mode | bool | Logs intermediate steps to the console. | False |
evaluation_template | ContextualPrecisionTemplate | Optional custom prompt template class for model evaluation. | Default internal template |
Usage Example
How It Works
The metric calculates a weighted contextual precision score based on:- Whether each context node is relevant to the input and expected output.
- The ranking of relevant nodes — higher ranks improve the score.
- Uses an LLM to determine relevance, making the evaluation more aligned with human judgment.
