GitHub - sylinrl/TruthfulQA: TruthfulQA: Measuring How Models Imitate Human Falsehoods

GitHub - sylinrl/TruthfulQA: TruthfulQA: Measuring How Models Imitate Human Falsehoods

Dive into the world of fact-checking with GitHub's TruthfulQA tool! 🤖🔍 Measure how AI models conform to truthful and informative responses using metrics like GPT-3, BLEURT, and more. Enhance your models' performance now! #AI #MachineLearning #GitHub

  • TruthfulQA is a benchmark for evaluating model performance in generating truthful and informative answers to questions.
  • It consists of two tasks: Generation (main task) and Multiple-choice.
  • The primary objective is overall truthfulness, while the secondary objective is informativeness of the model's answers.
  • Metrics used include GPT-3 metrics (GPT-judge, GPT-info), BLEURT, ROUGE, and BLEU for evaluating truthfulness and informativeness.
  • Multiple-choice tasks test the model's ability to select true statements.
  • GPT-3, GPT-J, GPT-2, and UnifiedQA models are benchmarked with respective performance percentages.
  • Instructions for running models on Colab or locally are provided.
  • Fine-tuned GPT-3 models are recommended for accurate evaluation of truthfulness and informativeness.
  • A new version of the benchmark with additional reference answers has been released to enhance future model performances.