文章/答案/技术大牛

发布

社区首页 >问答首页 >带有新标签的微调模型分类器层

问带有新标签的微调模型分类器层
EN

Stack Overflow用户

提问于 2021-04-19 08:32:28

回答 1查看 884关注 0票数 2

我想微调已经调优的BertForSequenceClassification模型，新的数据集只包含一个额外的标签，这是模型以前从未见过的。

这样，我想在模型当前能够正确分类的一组标签中添加一个新标签。

此外，我不希望随机初始化分类器权重，我希望保持它们完整，并相应地将它们更新到数据集示例中，同时将分类器层的大小增加1。

用于进一步微调的数据集可能如下所示：

sentece,label
intent example 1,new_label
intent example 2,new_label
...
intent example 10,new_label

我的模型当前的分类器层如下所示：

Linear(in_features=768, out_features=135, bias=True)

我怎样才能做到这一点？

这是个好办法吗？

pytorch

huggingface-transformers

回答 1

Stack Overflow用户

回答已采纳

发布于 2021-04-21 00:19:14

您可以用新的值扩展模型的权重和偏差。请看下面的评论示例：

#This is the section that loads your model
#I will just use an pretrained model for this example
import torch
from torch import nn
from transformers import AutoModelForSequenceClassification, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("jpcorb20/toxic-detector-distilroberta")
model = AutoModelForSequenceClassification.from_pretrained("jpcorb20/toxic-detector-distilroberta")
#we check the output of one sample to compare it later with the extended layer
#to verify that we kept the previous learnt "knowledge"
f = tokenizer.encode_plus("This is an example", return_tensors='pt')
print(model(**f).logits)

#Now we need to find out the name of the linear layer you want to extend
#The layers on top of distilroberta are wrapped inside a classifier section
#This name can differ for you because it can be chosen randomly
#use model.parameters instead find the classification layer
print(model.classifier)

#The output shows us that the classification layer is called `out_proj`
#We can now extend the weights by creating a new tensor that consists of the
#old weights and a randomly initialized tensor for the new label 
model.classifier.out_proj.weight = nn.Parameter(torch.cat((model.classifier.out_proj.weight, torch.randn(1,768)),0))

#We do the same for the bias:
model.classifier.out_proj.bias = nn.Parameter(torch.cat((model.classifier.out_proj.bias, torch.randn(1)),0))

#and be happy when we compare the output with our expectation 
print(model(**f).logits)

输出：

tensor([[-7.3604, -9.4899, -8.4170, -9.7688, -8.4067, -9.3895]],
       grad_fn=<AddmmBackward>)
RobertaClassificationHead(
  (dense): Linear(in_features=768, out_features=768, bias=True)
  (dropout): Dropout(p=0.1, inplace=False)
  (out_proj): Linear(in_features=768, out_features=6, bias=True)
)
tensor([[-7.3604, -9.4899, -8.4170, -9.7688, -8.4067, -9.3895,  2.2124]],
       grad_fn=<AddmmBackward>)

票数 3

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/67158554

复制

相似问题

问带有新标签的微调模型分类器层
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问带有新标签的微调模型分类器层EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问带有新标签的微调模型分类器层
EN