我想用huggingface做中文文本相似度:
tokenizer = BertTokenizer.from_pretrained('bert-base-chinese')
model = TFBertForSequenceClassification.from_pretrained('bert-base-chinese')它不工作,系统报告错误:
Some weights of the model checkpoint at bert-base-chinese were not used when initializing TFBertForSequenceClassification: ['nsp___cls', 'mlm___cls']
- This IS expected if you are initializing TFBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPretraining model).
- This IS NOT expected if you are initializing TFBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of TFBertForSequenceClassification were not initialized from the model checkpoint at bert-base-chinese and are newly initialized: ['classifier', 'dropout_37']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.但是我可以使用huggingface来命名实体:
tokenizer = BertTokenizer.from_pretrained('bert-base-chinese')
model = TFBertForTokenClassification.from_pretrained("bert-base-chinese")这是不是意味着huggingface没有做中文序列分类?如果我的判断是对的,那么如何在只有12G内存的colab上解决这个问题呢?
发布于 2021-08-29 20:14:39
原因很简单。该模型尚未针对序列分类任务进行微调,因此当您尝试在序列分类模型上加载“bert-base-chinese”模型时。它会随机更新其余的层‘nsp_cls’,'mlm___cls‘。这是一个警告,这意味着由于随机的最后一层初始化,模型将给出随机结果。
BTW @andy你没有上传令牌分类的输出吗?它也应该显示类似的警告,但“分类器”层是随机启动的。
一定要使用微调的模型,否则就需要对这个加载的模型进行微调。
https://stackoverflow.com/questions/62869640
复制相似问题