GitHub - jzhang38/TinyLlama: The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.

GitHub - jzhang38/TinyLlama: The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.

🚀 Dive into the world of advanced AI models with TinyLlama! 🦙🔍 Pretrained on 3 trillion tokens, this 1.1B Llama model offers blazing fast training on 16 A100-40G GPUs in just 90 days. ✨ Perfect for speculative decoding, real-time dialogue, and more! #AI #TinyLlama 🤖📈

  • TinyLlama project: pretrains a 1.1B Llama model on 3 trillion tokens using 16 A100-40G GPUs in 90 days.
  • Architecture: same as Llama 2 for compatibility with open-source projects.
  • Key dates: training started on 2023-09-01 with regular updates and optimizations.
  • Releases schedule: rolling out intermediate checkpoints with different tokens and steps.
  • Use cases: assists in speculative decoding, edge device deployment, real-time dialogue generation in games.
  • Training details: parameters, attention variant, sequence length, batch size, learning rate, training data sources, hardware setup.
  • Codebase features: supports multi-GPU and multi-node training, with optimizations for speed and memory efficiency.
  • Throughput: reaches 24k tokens per second per A100-40G GPU.
  • Training comparison: TinyLlama's efficiency compared to Pythia and MPT models on A100 GPUs.
  • Finetune: includes full-parameter finetuning scripts for chat models.
  • Development: ongoing plans to enhance pretraining scripts, evaluate model performance, and explore new applications.
  • Acknowledgements: built upon lit-gpt and flash-attention; contributors from the StatNLP Research Group.
  • Citation: how to cite the TinyLlama project contributors.
  • FAQs: answers questions about the rationale behind pretraining, model saturation, and scaling laws.
  • Project status: open endeavor for pretraining a small but powerful language model on a large corpus of data.