搜索 - 腾讯云开发者社区-腾讯云

文章/答案/技术大牛

发布

2回答

运行seq2seq模型时的流量误差

/Linear/Bias, embedding_attention_seq2seq/RNN/MultiRNNCell/Cell0/GRUCell/Candidate/Linear/Matrix, embedding_attention_seq2se/Attention_0/Linear/Bias, embedding_attention_seq2seq

浏览 3修改于2015-11-18得票数 0

1回答

在Tensorflow中使用bucketing时，如何在Adam优化器中共享梯度和变量？

/Gates/Linear/Matrix:0 (1056, 1600)embedding_attention_seq2seq/RNN/MultiRNNCell/Cell0/GRUCell/Gates&

浏览 2修改于2016-11-22得票数 1

1回答

为什么注意解码器的输出需要与注意相结合？

x = linear([inp] + attns, input_size, True)cell_output, state = cell(x, state)if i == 0 and initial_state_attention: attns = attent

浏览 2提问于2017-08-11得票数 0

回答已采纳

1回答

在一个模型中计算两个损失并反向传播两次

在init()中：在前进()中，我输出了一个预测的开始层和预测的结束层： outputs = self.bert(input_ids, attention_mask) # input = bert tokenize

浏览 2提问于2020-12-17得票数 0

回答已采纳

1回答

我是否需要加载我在NN类中使用的另一个类的权重？

self.multihead_attn.forward(x, x, x)class ActualModel(nn.Module): self.inp_layer = nn.Linear(arg1, arg2) self.out_layer = nn.Linearclass ActualModel(nn.Module): de

浏览 4提问于2021-08-11得票数 0

回答已采纳

1回答

Camembert和CRF相结合进行令牌分类

=False) (encoder): RobertaEncoder( (0): RobertaLayer( ) (1): RobertaLayer( ) (2): R

浏览 3提问于2022-04-01得票数 1

回答已采纳

1回答

打印预先训练模型的所有层的输入和输出的大小。

ModuleList( (norm1): LayerNorm((768,), eps=1e-06, elementwise_affine=True) (temporal_norm1): LayerNorm((768,), eps=1e-06, elementwise_affine=True) ) (norm1): LayerNorm((768,), eps=1e-

浏览 18修改于2022-07-26得票数 0

回答已采纳

2回答

“module.scibert_layer.embeddings.position_ids”：RuntimeError:在为DataParallel加载state_dict时出错: state_dict: DataParallel中的意外键

) (layer): ModuleList( (attentiondropout): Dropout(p=0.1, inplace=False) ) (attentiondropout): Dropout(p=0.1, inplace=False)

浏览 9提问于2021-07-20得票数 0

回答已采纳

2回答

使变压器BertForSequenceClassification初始层不可训练以进行pytorch训练

) (layer): ModuleList( (attentiondropout): Dropout(p=0.1, inplace=False) ) (attentiondropout): Dropout(p=0.1, inplace=False)

浏览 5提问于2020-04-23得票数 0

1回答

用火炬打印Bert模型摘要

False) (encoder): BertEncoder( (0): BertLayer( ) (1): BertLayer( )

浏览 2提问于2022-02-24得票数 0

回答已采纳

2回答

如何用多个GPU训练电筒模型？

# for param in self.bert.parameters(): self.linear= nn.Linear(2048, 4) def forward(self, input_ids, attention_mask): batch = input_ids

浏览 5修改于2022-08-08得票数 3

回答已采纳

1回答

如何打印kerasTensor的哪种类型的值

time_step, dim))print(lstm_out)attention_flatten = Flatten()(attention_mul) output = Dense(1, activation='linear')(attention_

浏览 2提问于2022-03-10得票数 0

1回答

在预先训练的模型中访问块内的模块

ModuleList( (norm1): LayerNorm((768,), eps=1e-06, elementwise_affine=True) (temporal_norm1): LayerNorm((768,), eps=1e-06, elementwise_affine=True) ) (norm1): LayerNorm((768,), eps=1e-

浏览 18提问于2022-07-25得票数 1

回答已采纳

1回答

如何在seq2seq_model的注意译码器中获取注意力值来绘制bleu分数

I want to visualize data as mentioned in [http://www.wildml.com/2016/01/attention-and-memory-in-deep-learning-and-nlp/](http://www.wildml.com/2016/01/attention-and-memory-in-deep-learning-and-nlp/) using bleu score. 2.for

浏览 3提问于2016-08-22得票数 0

回答已采纳

1回答

在PyTorch中访问预训练模型中的特定层

timesformer.models.vit import TimeSformer (0): Block( (attn): Attention) (temporal_norm1): LayerNorm((768,), eps=1e-06

浏览 11修改于2022-08-10得票数 3

回答已采纳

1回答

在视觉变压器模型中将Dropout设置为非零

blocks): Sequential((norm1): LayerNorm((768,), eps=1e-06, elementwise_affine=True))(1): Block((attn): Attentioninplace=False))(norm1): LayerNorm((768,),

浏览 4提问于2021-12-21得票数 0

1回答

五台变压器组合建模

init() self.pre_classifier = torch.nn.Linearinit() self.pre_classifier = torch.nn.LinearXLNetForSequenceClassification.from_pretrain

浏览 9修改于2022-02-01得票数 0

1回答

节点“合并/合并摘要”有来自不同帧的输入:这意味着什么？

tf.reset_default_graph()LENGTH = 4ATT_SIZE = 3 """Linear projection."""def __init__(self, attention_states, <

浏览 3提问于2017-02-13得票数 2

1回答

TypeError:元组索引必须是整数或切片，而不是元组。

inplace=False) (encoder): BertEncoder( (0): BertLayer( ) (1): BertLayer( ) (2): Ber

浏览 10修改于2022-07-03得票数 2

回答已采纳

1回答

TypeError:在调用用于生成嵌入的Bert方法时，Int‘对象不可调用

data to tensor formatprint(type(input_ids)) #<class 'torch.Tensor'> print(type(attention_mask)) #<class 'numpy.ndarray'> hidden_state= flaubert(input_ids,

浏览 60修改于2021-06-15得票数 1

第 2 页第 3 页第 4 页第 5 页第 6 页第 7 页第 8 页第 9 页第 10 页第 11 页

点击加载更多

运行seq2seq模型时的流量误差

在Tensorflow中使用bucketing时，如何在Adam优化器中共享梯度和变量？

为什么注意解码器的输出需要与注意相结合？

在一个模型中计算两个损失并反向传播两次

我是否需要加载我在NN类中使用的另一个类的权重？

Camembert和CRF相结合进行令牌分类

打印预先训练模型的所有层的输入和输出的大小。

“module.scibert_layer.embeddings.position_ids”：RuntimeError:在为DataParallel加载state_dict时出错: state_dict: DataParallel中的意外键

使变压器BertForSequenceClassification初始层不可训练以进行pytorch训练

用火炬打印Bert模型摘要

如何用多个GPU训练电筒模型？

如何打印kerasTensor的哪种类型的值

在预先训练的模型中访问块内的模块

如何在seq2seq_model的注意译码器中获取注意力值来绘制bleu分数

在PyTorch中访问预训练模型中的特定层

在视觉变压器模型中将Dropout设置为非零

五台变压器组合建模

节点“合并/合并摘要”有来自不同帧的输入:这意味着什么？

TypeError:元组索引必须是整数或切片，而不是元组。

TypeError:在调用用于生成嵌入的Bert方法时，Int‘对象不可调用

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐