GitHub - yandex/YaLM-100B: Pretrained language model with 100B parameters
🚀 Introducing YaLM-100B, a powerful pretrained language model with a whopping 100 billion parameters for text generation! 📚✨ Developed by the team at @yandex, now available on GitHub under Apache 2.0 license. Get ready to level up your NLP tasks! #AI #NLProc" 🤖📝
- YaLM-100B is a GPT-like neural network with 100 billion parameters for text generation.
- It was trained on a cluster of 800 A100 graphics cards for 65 days using 1.7 TB of online texts in English and Russian.
- The training dataset includes various sources like The Pile, Russian web pages from Yandex Search, news, books, and social media dialogues.
- The model is published under the Apache 2.0 license for both research and commercial use.
- Setup requires 200GB of free disk space for downloading weights and is designed to run on multiple GPUs with tensor parallelism.
- Checkpoint can be downloaded using the bash script provided in the repository, and a Docker image is available on Docker Hub.
- Several example scripts are provided for interactive, conditional, and unconditional text generation tasks.
- The training process involved consuming 300B tokens over the training period.
- Licensing details indicate Megatron-LM's license differs from YaLM-100B's Apache 2.0 license.
- The development team included contributors like artnitolog, Vasilev Ruslan, nzinov Nikolay Zinov, and petrovlesha Alexey Petrov.