GitHub - microsoft/DeBERTa: The implementation of DeBERTa

🚀 Dive into the world of cutting-edge NLP models with DeBERTa! 🤖✨ 🔍 DeBERTa V3: Enhanced with disentangled attention & ELECTRA-Style Pre-Training 💡 Achieve top-notch results on NLU tasks & benchmarks with DeBERTa 🔗 Check out the implementation on GitHub: microsoft/DeBERTa #AI #NLP

DeBERTa is a decoding-enhanced BERT model with disentangled attention.
DeBERTa V3 introduces improvements using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing.
The models are based on DeBERTa-V2, replacing MLM with ELECTRA-style objective for enhanced efficiency.
DeBERTa uses a new vocabulary of size 128K and sentencepiece tokenizer for tokenization.
Features like nGiE and shared position projection matrix enhance local dependency learning and save parameters.
DeBERTa scales model size to 900M and 1.5B, improving downstream task performance significantly.
DeBERTa models outperform T5 and achieve human-level performance on SuperGLUE tasks.
Pre-trained DeBERTa models are available in different sizes and configurations for different tasks.
Requirements for using DeBERTa include Linux system, CUDA, PyTorch, Python, and shell access.
Experiment scripts and fine-tuning methods are provided for various NLU tasks and benchmarks.
Performance results show DeBERTa models outperform BERT, RoBERTa, and XLNet on various tasks.
Contacts for DeBERTa include Pengcheng He, Xiaodong Liu, Jianfeng Gao, and Weizhu Chen.
Citations for DeBERTaV3 and DeBERTa are provided for reference.