Top 15 Pre-trained Nlp Language Fashions
Compared to RoBERTa-Large, a DeBERTa model trained on half of the training data performs persistently higher on a wide range of NLP duties, attaining improvements on MNLI by +0.9% (90.2% vs. 91.1%), on SQuAD v2.zero by +2.3% (88.4% vs. 90.7%) and RACE by +3.6% (83.2% vs. 86.8%). Notably, we scale up DeBERTa by training a…