GitHub - microsoft/DeBERTa: The implementation of DeBERTa

GitHub - microsoft/DeBERTa: The implementation of DeBERTa

šŸš€ Dive into the world of cutting-edge NLP models with DeBERTa! šŸ¤–āœØ šŸ” DeBERTa V3: Enhanced with disentangled attention & ELECTRA-Style Pre-Training šŸ’” Achieve top-notch results on NLU tasks & benchmarks with DeBERTa šŸ”— Check out the implementation on GitHub: microsoft/DeBERTa #AI #NLP

  • DeBERTa is a decoding-enhanced BERT model with disentangled attention.
  • DeBERTa V3 introduces improvements using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing.
  • The models are based on DeBERTa-V2, replacing MLM with ELECTRA-style objective for enhanced efficiency.
  • DeBERTa uses a new vocabulary of size 128K and sentencepiece tokenizer for tokenization.
  • Features like nGiE and shared position projection matrix enhance local dependency learning and save parameters.
  • DeBERTa scales model size to 900M and 1.5B, improving downstream task performance significantly.
  • DeBERTa models outperform T5 and achieve human-level performance on SuperGLUE tasks.
  • Pre-trained DeBERTa models are available in different sizes and configurations for different tasks.
  • Requirements for using DeBERTa include Linux system, CUDA, PyTorch, Python, and shell access.
  • Experiment scripts and fine-tuning methods are provided for various NLU tasks and benchmarks.
  • Performance results show DeBERTa models outperform BERT, RoBERTa, and XLNet on various tasks.
  • Contacts for DeBERTa include Pengcheng He, Xiaodong Liu, Jianfeng Gao, and Weizhu Chen.
  • Citations for DeBERTaV3 and DeBERTa are provided for reference.