GitHub - THUDM/GLM-130B: GLM-130B: An Open Bilingual Pre-Trained Model (ICLR 2023)
🚀 Introducing GLM-130B: A revolutionary bilingual pre-trained model with 130 billion parameters! 🤖📚 Outperforming GPT-3 175B in multiple tasks, supporting fast inference, hardware flexibility, and reproducible results. Get ready for an AI powerhouse! 🔥 #AI #GLM130B #ICLR2023
- GLM-130B is a bilingual model with 130 billion parameters trained on over 400 billion text tokens in English and Chinese.
- It outperforms GPT-3 175B in several tasks and supports fast inference with a single A100 server.
- Reproducibility of results and cross-platform compatibility are highlighted features.
- GLM-130B can be used with different hardware setups and supports INT8/INT4 quantization for reduced hardware requirements.
- The model code is based on SAT and requires specific environment configurations for optimal performance.
- Model weights can be downloaded and merged for usage, with recommendations for efficient storage and loading.
- Task evaluation can be performed using YAML files, with a provided dataset for testing.
- Multi-node evaluation and task customization are supported.
- The model is optimized for up to 2.5X faster inference using FasterTransformer.
- Licensing information is provided, and citation is encouraged.