Introduction
Welcome to the Agensight Evaluation , a comprehensive suite designed to enhance the evaluation of LLM (Large Language Model) applications. Our framework supports a wide range of evaluation metrics, including those for Retrieval-Augmented Generation (RAG) and multimodal applications. Agensight leverages the powerful capabilities of the open-source library DeepEval to provide robust and flexible evaluation metrics.
Key Features
-
Comprehensive Metric Support: Seamlessly integrate over 20+ research-backed metrics into your evaluation processes, built on the foundation of DeepEval.
-
Customizable Evaluations: Easily tailor metrics to suit your specific evaluation needs, whether for end-to-end or component-level assessments.
-
Cloud and Local Evaluations: Run evaluations locally or leverage cloud platforms to manage and analyze your evaluation results.
-
Security and Robustness: Conduct red teaming and safety scans to ensure your LLM applications are secure and robust against adversarial attacks.
Getting Started
To begin using Agensight Evaluation, ensure you have a Python environment set up. Install the necessary dependencies for Agensight using:
Running Your First Evaluation
Create a test file and define your test cases using the metrics provided by Agensight. Here’s a quick example:
Run your evaluation with:
Licensing and Contributions
Agensight Evaluation is built with open-source principles, and we encourage contributions from the community to enhance its capabilities. If you find Agensight useful, consider giving it a star on GitHub and contributing to its development.