GitHub - sylinrl/TruthfulQA: TruthfulQA: Measuring How Models Imitate Human Falsehoods
Dive into the world of fact-checking with GitHub's TruthfulQA tool! 🤖🔍 Measure how AI models conform to truthful and informative responses using metrics like GPT-3, BLEURT, and more. Enhance your models' performance now! #AI #MachineLearning #GitHub
- TruthfulQA is a benchmark for evaluating model performance in generating truthful and informative answers to questions.
- It consists of two tasks: Generation (main task) and Multiple-choice.
- The primary objective is overall truthfulness, while the secondary objective is informativeness of the model's answers.
- Metrics used include GPT-3 metrics (GPT-judge, GPT-info), BLEURT, ROUGE, and BLEU for evaluating truthfulness and informativeness.
- Multiple-choice tasks test the model's ability to select true statements.
- GPT-3, GPT-J, GPT-2, and UnifiedQA models are benchmarked with respective performance percentages.
- Instructions for running models on Colab or locally are provided.
- Fine-tuned GPT-3 models are recommended for accurate evaluation of truthfulness and informativeness.
- A new version of the benchmark with additional reference answers has been released to enhance future model performances.