https://github.com/facebookresearch/belebele

GitHub - facebookresearch/belebele: Repo for the Belebele dataset, a massively multilingual reading comprehension dataset.

🚀 Dive into the world of multilingual machine reading comprehension with the Belebele dataset by @facebookresearch! 📚🌍 Featuring 900 questions per language variant in 122 languages, it's perfect for evaluating models across different resource settings. Train, evaluate, and advance your AI capabilities with Belebele! #AI #MachineLearning #NLP

**Description**: Belebele is a multiple-choice machine reading comprehension dataset with 122 language variants.
**Content**: Includes 900 questions per language variant linked to passages from the FLORES-200 dataset.
**Evaluation**: Allows evaluating mono- and multi-lingual models in various resource settings like zero-shot, few-shot, and finetuning.
**Settings**: Zero-shot evaluations can be done with natural language instructions or translated instructions.
**Training Set**: Assembled training set from diverse QA datasets for task-specific finetuning.
**Languages**: Spans 122 language variants, 115 distinct languages, and includes various scripts.