Three different GBNs, namely uniform GBN, language-model GBN and coaching GBN, are proposed to penalize
BERT使用了MLM(masked language-model)和NSP(Next Sentence Prediction)两个预训练任务来进行训练,这两个任务可能并不足以让BERT学到那么多复杂的语言知识 * * *说明:masked language-model(MLM)是指在训练的时候随即从输入预料上mask掉一些单词,然后通过的上下文预测这些单词,该任务非常像我们在中学时期经常做的完形填空。
Classification (2018), https://www.aclweb.org/anthology/P18-1031.pdf What Kind of Language Is Hard to Language-Model