GitHub - facebookresearch/belebele: Repo for the Belebele dataset, a massively multilingual reading comprehension dataset.

GitHub - facebookresearch/belebele: Repo for the Belebele dataset, a massively multilingual reading comprehension dataset.

🚀 Dive into the world of multilingual machine reading comprehension with the Belebele dataset by @facebookresearch! 📚🌍 Featuring 900 questions per language variant in 122 languages, it's perfect for evaluating models across different resource settings. Train, evaluate, and advance your AI capabilities with Belebele! #AI #MachineLearning #NLP

  • **Description**: Belebele is a multiple-choice machine reading comprehension dataset with 122 language variants.
  • **Content**: Includes 900 questions per language variant linked to passages from the FLORES-200 dataset.
  • **Evaluation**: Allows evaluating mono- and multi-lingual models in various resource settings like zero-shot, few-shot, and finetuning.
  • **Settings**: Zero-shot evaluations can be done with natural language instructions or translated instructions.
  • **Training Set**: Assembled training set from diverse QA datasets for task-specific finetuning.
  • **Languages**: Spans 122 language variants, 115 distinct languages, and includes various scripts.