GitHub - microsoft/promptbench: A unified evaluation framework for large language models

GitHub - microsoft/promptbench: A unified evaluation framework for large language models

🚀 Discover **PromptBench** by Microsoft: A Pytorch-based toolkit for evaluating Large Language Models. Leverage prompt engineering. Test model robustness with adversarial attacks. Install via Pip or GitHub. Tutorials available! #AI #NLP #LanguageModels 🤖💬

  • **PromptBench**: A Pytorch-based Python package for evaluating Large Language Models (LLMs).
  • **Quick Model Performance Assessment**: User-friendly interface for model building, dataset loading, and performance evaluation.
  • **Prompt Engineering**: Methods like Few-shot Chain-of-Thought, Emotion Prompt, and Expert Prompting.
  • **Adversarial Prompts Evaluation**: Integration of prompt attacks to simulate black-box attacks and assess model robustness.
  • **Dynamic Evaluation with DyVal**: Generates evaluation samples on-the-fly to avoid test data contamination.
  • **Installation via Pip or GitHub**: Allows users to install and utilize the package easily.
  • **Ease of Use and Extension**: Tutorials available for evaluating models, testing different prompting techniques, and utilizing DyVal for evaluation.
  • **Support for Various Components**: Datasets (including GLUE, MMLU), Models (Open-source and Proprietary), Prompt Engineering methods, and Adversarial Attacks.
  • **Acknowledgements**: Credits to contributors and volunteers in the project.
  • **References**: Citations for related research papers and experiments.
  • **Contributions Welcome**: Open project for contributions and suggestions following guidelines.
  • **Trademark Awareness**: Respect for trademarks and logos within the project.
  • **Unified Evaluation Framework**: Aimed at assessing the robustness of Large Language Models.
  • **Topics Covered**: Benchmark evaluation, prompt robustness, adversarial attacks, prompt engineering.