GitHub - yandex/YaLM-100B: Pretrained language model with 100B parameters

GitHub - yandex/YaLM-100B: Pretrained language model with 100B parameters

🚀 Introducing YaLM-100B, a powerful pretrained language model with a whopping 100 billion parameters for text generation! 📚✨ Developed by the team at @yandex, now available on GitHub under Apache 2.0 license. Get ready to level up your NLP tasks! #AI #NLProc" 🤖📝

  • YaLM-100B is a GPT-like neural network with 100 billion parameters for text generation.
  • It was trained on a cluster of 800 A100 graphics cards for 65 days using 1.7 TB of online texts in English and Russian.
  • The training dataset includes various sources like The Pile, Russian web pages from Yandex Search, news, books, and social media dialogues.
  • The model is published under the Apache 2.0 license for both research and commercial use.
  • Setup requires 200GB of free disk space for downloading weights and is designed to run on multiple GPUs with tensor parallelism.
  • Checkpoint can be downloaded using the bash script provided in the repository, and a Docker image is available on Docker Hub.
  • Several example scripts are provided for interactive, conditional, and unconditional text generation tasks.
  • The training process involved consuming 300B tokens over the training period.
  • Licensing details indicate Megatron-LM's license differs from YaLM-100B's Apache 2.0 license.
  • The development team included contributors like artnitolog, Vasilev Ruslan, nzinov Nikolay Zinov, and petrovlesha Alexey Petrov.