GitHub - sylinrl/TruthfulQA: TruthfulQA: Measuring How Models Imitate Human Falsehoods

Dive into the world of fact-checking with GitHub's TruthfulQA tool! 🤖🔍 Measure how AI models conform to truthful and informative responses using metrics like GPT-3, BLEURT, and more. Enhance your models' performance now! #AI #MachineLearning #GitHub

TruthfulQA is a benchmark for evaluating model performance in generating truthful and informative answers to questions.
It consists of two tasks: Generation (main task) and Multiple-choice.
The primary objective is overall truthfulness, while the secondary objective is informativeness of the model's answers.
Metrics used include GPT-3 metrics (GPT-judge, GPT-info), BLEURT, ROUGE, and BLEU for evaluating truthfulness and informativeness.
Multiple-choice tasks test the model's ability to select true statements.
GPT-3, GPT-J, GPT-2, and UnifiedQA models are benchmarked with respective performance percentages.
Instructions for running models on Colab or locally are provided.
Fine-tuned GPT-3 models are recommended for accurate evaluation of truthfulness and informativeness.
A new version of the benchmark with additional reference answers has been released to enhance future model performances.