文章/答案/技术大牛

发布

社区首页 >问答首页 >如何训练变压器顶部的模型以输出序列？

问如何训练变压器顶部的模型以输出序列？
EN

Data Science用户

提问于 2020-10-30 11:37:22

回答 1查看 278关注 0票数 2

我正在使用拥抱脸来建立一个模型，它能够识别给定句子中的错误。假设我有一个给定的句子和一个相应的标签，如下所示：->

correct_sentence = "we used to play together."
correct_label = [1, 1, 1, 1, 1]

changed_sentence = "we use play to together."
changed_label = [1, 2, 2, 2, 1]

这些标签被进一步填充到与512长度相等的0。句子也被标记化，并被向上(或向下)填充到这个长度。模式如下：

class Camembert(torch.nn.Module):
    """
    The definition of the custom model, last 15 layers of Camembert will be retrained
    and then a fcn to 512 (the size of every label).
    """
    def __init__(self, cam_model):
        super(Camembert, self).__init__()
        self.l1 = cam_model
        total_layers = 199
        for i, param in enumerate(cam_model.parameters()):
            if total_layers - i > hparams["retrain_layers"]:
                param.requires_grad = False
            else:
                pass
        self.l2 = torch.nn.Dropout(hparams["dropout_rate"])
        self.l3 = torch.nn.Linear(768, 512)

    def forward(self, ids, mask):
        _, output = self.l1(ids, attention_mask=mask)
        output = self.l2(output)
        output = self.l3(output)
        return output

例如，batch_size=2，因此输出层将是(2, 512)，与target_label相同。据我所知，这种方法就像说要对512类进行分类，而这不是我想要的，当我试图使用torch.nn.CrossEntropyLoss()计算损失时出现了问题，它给了我以下错误(截断)：

 File "D:\Anaconda\lib\site-packages\torch\nn\functional.py", line 1838, in nll_loss
    ret = torch._C._nn.nll_loss(input, target, weight, _Reduction.get_enum(reduction), igno
re_index)
RuntimeError: multi-target not supported at C:/w/1/s/tmp_conda_3.7_100118/conda/conda-bld/p
ytorch_1579082551706/work/aten/src\THCUNN/generic/ClassNLLCriterion.cu:15

我应该如何解决这个问题，是否有类似类型的模型的教程？

sequence-to-sequence

pytorch

transformer

sequence

回答 1

Data Science用户

回答已采纳

发布于 2020-10-30 18:14:46

我认为你应该把这个问题当作一个二元分类问题来处理。对于修改后的句子中的每个单词，您将有一个二进制标签:正确或不正确。我建议重新命名，以便“正确”单词的标签为0，“不正确”单词的标签为1。在您的示例中，您将拥有：

correct_sentence = "we used to play together"
changed_sentence = "we use play to together"
labels = [0, 1, 1, 1, 0]

而不是填充一些特殊的价值，垫上“正确”标签(如果你使用我上面的建议将是0)。

通常，类标签总是从索引0开始，因此这种标记方案将与PyTorch对二进制分类问题的期望相匹配。

接下来，您需要更改最后一个Linear层的激活函数。现在，您的模型仅以一个Linear层结束，这意味着输出是无界的。对于分类问题来说，这是没有意义的，因为您知道输出应该始终在0，C1范围内，其中C是类的数量。

相反，您应该应用一个激活函数，使您的输出更像类标签。对于二进制分类问题，最终激活的一个好选择是torch.nn.Sigmoid。您将修改模型定义如下：

class Camembert(torch.nn.Module):
    """
    The definition of the custom model, last 15 layers of Camembert will be retrained
    and then a fcn to 512 (the size of every label).
    """
    def __init__(self, cam_model):
        super(Camembert, self).__init__()
        self.l1 = cam_model
        total_layers = 199
        for i, param in enumerate(cam_model.parameters()):
            if total_layers - i > hparams["retrain_layers"]:
                param.requires_grad = False
            else:
                pass
        self.l2 = torch.nn.Dropout(hparams["dropout_rate"])
        self.l3 = torch.nn.Linear(768, 512)
        self.activation = torch.nn.Sigmoid()

    def forward(self, ids, mask):
        _, output = self.l1(ids, attention_mask=mask)
        output = self.l2(output)
        output = self.l3(output)
        output = self.activation(output)
        return output

您的输出现在将具有维度(batch_size，512，1)。512输出中的每一个都是0到1之间的数字，您可以将其视为每个特定令牌“不正确”的概率。如果输出大于0.5，则标签将变为“不正确”。否则，标签是“正确的”。

最后，由于您将该问题作为二进制分类问题来处理，因此您需要使用二进制交叉熵损失(torch.nn.BCELoss)。请注意，您必须对标签进行unsqueeze，以使其尺寸与输出的维度相匹配。

model = Camembert(cam_model)
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

input = <tokenized, padded input sequence>
labels = torch.tensor([0, 1, 1, 1, 0, . . .  , 0])
output = model(input)
loss = criterion(output, labels.unsqueeze(1))

optimizer.zero_grad()
loss.backward()
optimizer.step()

票数 1

页面原文内容由Data Science提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://datascience.stackexchange.com/questions/84703

复制

相似问题

问如何训练变压器顶部的模型以输出序列？
EN

回答 1

Data Science用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何训练变压器顶部的模型以输出序列？EN

回答 1

Data Science用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何训练变压器顶部的模型以输出序列？
EN