GitHub - yandex/YaLM-100B: Pretrained language model with 100B parameters

🚀 Introducing YaLM-100B, a powerful pretrained language model with a whopping 100 billion parameters for text generation! 📚✨ Developed by the team at @yandex, now available on GitHub under Apache 2.0 license. Get ready to level up your NLP tasks! #AI #NLProc" 🤖📝

YaLM-100B is a GPT-like neural network with 100 billion parameters for text generation.
It was trained on a cluster of 800 A100 graphics cards for 65 days using 1.7 TB of online texts in English and Russian.
The training dataset includes various sources like The Pile, Russian web pages from Yandex Search, news, books, and social media dialogues.
The model is published under the Apache 2.0 license for both research and commercial use.
Setup requires 200GB of free disk space for downloading weights and is designed to run on multiple GPUs with tensor parallelism.
Checkpoint can be downloaded using the bash script provided in the repository, and a Docker image is available on Docker Hub.
Several example scripts are provided for interactive, conditional, and unconditional text generation tasks.
The training process involved consuming 300B tokens over the training period.
Licensing details indicate Megatron-LM's license differs from YaLM-100B's Apache 2.0 license.
The development team included contributors like artnitolog, Vasilev Ruslan, nzinov Nikolay Zinov, and petrovlesha Alexey Petrov.