GitHub - microsoft/promptbench: A unified evaluation framework for large language models
🚀 Discover **PromptBench** by Microsoft: A Pytorch-based toolkit for evaluating Large Language Models. Leverage prompt engineering. Test model robustness with adversarial attacks. Install via Pip or GitHub. Tutorials available! #AI #NLP #LanguageModels 🤖💬
- **PromptBench**: A Pytorch-based Python package for evaluating Large Language Models (LLMs).
- **Quick Model Performance Assessment**: User-friendly interface for model building, dataset loading, and performance evaluation.
- **Prompt Engineering**: Methods like Few-shot Chain-of-Thought, Emotion Prompt, and Expert Prompting.
- **Adversarial Prompts Evaluation**: Integration of prompt attacks to simulate black-box attacks and assess model robustness.
- **Dynamic Evaluation with DyVal**: Generates evaluation samples on-the-fly to avoid test data contamination.
- **Installation via Pip or GitHub**: Allows users to install and utilize the package easily.
- **Ease of Use and Extension**: Tutorials available for evaluating models, testing different prompting techniques, and utilizing DyVal for evaluation.
- **Support for Various Components**: Datasets (including GLUE, MMLU), Models (Open-source and Proprietary), Prompt Engineering methods, and Adversarial Attacks.
- **Acknowledgements**: Credits to contributors and volunteers in the project.
- **References**: Citations for related research papers and experiments.
- **Contributions Welcome**: Open project for contributions and suggestions following guidelines.
- **Trademark Awareness**: Respect for trademarks and logos within the project.
- **Unified Evaluation Framework**: Aimed at assessing the robustness of Large Language Models.
- **Topics Covered**: Benchmark evaluation, prompt robustness, adversarial attacks, prompt engineering.