GitHub - jzhang38/TinyLlama: The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.

🚀 Dive into the world of advanced AI models with TinyLlama! 🦙🔍 Pretrained on 3 trillion tokens, this 1.1B Llama model offers blazing fast training on 16 A100-40G GPUs in just 90 days. ✨ Perfect for speculative decoding, real-time dialogue, and more! #AI #TinyLlama 🤖📈

TinyLlama project: pretrains a 1.1B Llama model on 3 trillion tokens using 16 A100-40G GPUs in 90 days.
Architecture: same as Llama 2 for compatibility with open-source projects.
Key dates: training started on 2023-09-01 with regular updates and optimizations.
Releases schedule: rolling out intermediate checkpoints with different tokens and steps.
Use cases: assists in speculative decoding, edge device deployment, real-time dialogue generation in games.
Training details: parameters, attention variant, sequence length, batch size, learning rate, training data sources, hardware setup.
Codebase features: supports multi-GPU and multi-node training, with optimizations for speed and memory efficiency.
Throughput: reaches 24k tokens per second per A100-40G GPU.
Training comparison: TinyLlama's efficiency compared to Pythia and MPT models on A100 GPUs.
Finetune: includes full-parameter finetuning scripts for chat models.
Development: ongoing plans to enhance pretraining scripts, evaluate model performance, and explore new applications.
Acknowledgements: built upon lit-gpt and flash-attention; contributors from the StatNLP Research Group.
Citation: how to cite the TinyLlama project contributors.
FAQs: answers questions about the rationale behind pretraining, model saturation, and scaling laws.
Project status: open endeavor for pretraining a small but powerful language model on a large corpus of data.