当基于sloBERTa的CamemBERT微调多类分类任务的SimpleTransformers模型时,我想使用模型参数"max_seq_length":512,因为以前的工作表明它提供了比128个更好的结果,但是包含这个参数会引发下面的错误。这个错误在Kaggle和Google环境中是一样的,终止执行和重新运行没有帮助。无论培训时代的数量有多小,都会触发错误,而且数据集只包含600个实例(文本作为字符串,标签为整数)。我已经尝试将max_seq_length降低到509、500和128,但是错误仍然存在。
没有这个参数的设置正常工作,允许90个时期的训练,否则我就有足够的内存。
from simpletransformers.classification import ClassificationModel
# define hyperparameter
model_args ={"overwrite_output_dir": True,
"num_train_epochs": 90,
"labels_list": LABELS_NUM,
"learning_rate": 1e-5,
"train_batch_size": 32,
"no_cache": True,
"no_save": True,
#"max_seq_length": 512,
"save_steps": -1,
}
model = ClassificationModel(
"camembert", "EMBEDDIA/sloberta",
use_cuda = device,
num_labels = NUM_LABELS,
args = model_args)
model.train_model(train_df)这是一个错误:
RuntimeError Traceback (most recent call last)
/tmp/ipykernel_34/2529369927.py in <module>
19 args = model_args)
20
---> 21 model.train_model(train_df)
/opt/conda/lib/python3.7/site-packages/simpletransformers/classification/classification_model.py in train_model(self, train_df, multi_label, output_dir, show_running_loss, args, eval_df, verbose, **kwargs)
610 eval_df=eval_df,
611 verbose=verbose,
--> 612 **kwargs,
613 )
614
/opt/conda/lib/python3.7/site-packages/simpletransformers/classification/classification_model.py in train(self, train_dataloader, output_dir, multi_label, show_running_loss, eval_df, test_df, verbose, **kwargs)
883 loss_fct=self.loss_fct,
884 num_labels=self.num_labels,
--> 885 args=self.args,
886 )
887 else:
/opt/conda/lib/python3.7/site-packages/simpletransformers/classification/classification_model.py in _calculate_loss(self, model, inputs, loss_fct, num_labels, args)
2256
2257 def _calculate_loss(self, model, inputs, loss_fct, num_labels, args):
-> 2258 outputs = model(**inputs)
2259 # model outputs are always tuple in pytorch-transformers (see doc)
2260 loss = outputs[0]
/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
720 result = self._slow_forward(*input, **kwargs)
721 else:
--> 722 result = self.forward(*input, **kwargs)
723 for hook in itertools.chain(
724 _global_forward_hooks.values(),
/opt/conda/lib/python3.7/site-packages/transformers/models/roberta/modeling_roberta.py in forward(self, input_ids, attention_mask, token_type_ids, position_ids, head_mask, inputs_embeds, labels, output_attentions, output_hidden_states, return_dict)
1210 output_attentions=output_attentions,
1211 output_hidden_states=output_hidden_states,
-> 1212 return_dict=return_dict,
1213 )
1214 sequence_output = outputs[0]
/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
720 result = self._slow_forward(*input, **kwargs)
721 else:
--> 722 result = self.forward(*input, **kwargs)
723 for hook in itertools.chain(
724 _global_forward_hooks.values(),
/opt/conda/lib/python3.7/site-packages/transformers/models/roberta/modeling_roberta.py in forward(self, input_ids, attention_mask, token_type_ids, position_ids, head_mask, inputs_embeds, encoder_hidden_states, encoder_attention_mask, past_key_values, use_cache, output_attentions, output_hidden_states, return_dict)
859 output_attentions=output_attentions,
860 output_hidden_states=output_hidden_states,
--> 861 return_dict=return_dict,
862 )
863 sequence_output = encoder_outputs[0]
/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
720 result = self._slow_forward(*input, **kwargs)
721 else:
--> 722 result = self.forward(*input, **kwargs)
723 for hook in itertools.chain(
724 _global_forward_hooks.values(),
/opt/conda/lib/python3.7/site-packages/transformers/models/roberta/modeling_roberta.py in forward(self, hidden_states, attention_mask, head_mask, encoder_hidden_states, encoder_attention_mask, past_key_values, use_cache, output_attentions, output_hidden_states, return_dict)
531 encoder_attention_mask,
532 past_key_value,
--> 533 output_attentions,
534 )
535
/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
720 result = self._slow_forward(*input, **kwargs)
721 else:
--> 722 result = self.forward(*input, **kwargs)
723 for hook in itertools.chain(
724 _global_forward_hooks.values(),
/opt/conda/lib/python3.7/site-packages/transformers/models/roberta/modeling_roberta.py in forward(self, hidden_states, attention_mask, head_mask, encoder_hidden_states, encoder_attention_mask, past_key_value, output_attentions)
415 head_mask,
416 output_attentions=output_attentions,
--> 417 past_key_value=self_attn_past_key_value,
418 )
419 attention_output = self_attention_outputs[0]
/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
720 result = self._slow_forward(*input, **kwargs)
721 else:
--> 722 result = self.forward(*input, **kwargs)
723 for hook in itertools.chain(
724 _global_forward_hooks.values(),
/opt/conda/lib/python3.7/site-packages/transformers/models/roberta/modeling_roberta.py in forward(self, hidden_states, attention_mask, head_mask, encoder_hidden_states, encoder_attention_mask, past_key_value, output_attentions)
344 encoder_attention_mask,
345 past_key_value,
--> 346 output_attentions,
347 )
348 attention_output = self.output(self_outputs[0], hidden_states)
/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
720 result = self._slow_forward(*input, **kwargs)
721 else:
--> 722 result = self.forward(*input, **kwargs)
723 for hook in itertools.chain(
724 _global_forward_hooks.values(),
/opt/conda/lib/python3.7/site-packages/transformers/models/roberta/modeling_roberta.py in forward(self, hidden_states, attention_mask, head_mask, encoder_hidden_states, encoder_attention_mask, past_key_value, output_attentions)
273 attention_probs = attention_probs * head_mask
274
--> 275 context_layer = torch.matmul(attention_probs, value_layer)
276
277 context_layer = context_layer.permute(0, 2, 1, 3).contiguous()
RuntimeError: CUDA out of memory. Tried to allocate 192.00 MiB (GPU 0; 15.90 GiB total capacity; 15.04 GiB already allocated; 15.75 MiB free; 15.12 GiB reserved in total by PyTorch)另外的代码(如果有帮助的话--我已经尝试了所有关于我在网络上找到的pytorch的东西--完整的代码可以在https://www.kaggle.com/tajakuz/0-sloberta-example-max-seq-length-error上访问):
!conda install --yes pytorch>=1.6 cudatoolkit=11.0 -c pytorch
# install simpletransformers
!pip install -q transformers
!pip install --upgrade transformers
!pip install -q simpletransformers
# check installed version
!pip freeze | grep simpletransformers
!pip uninstall -q torch -y
!pip install -q torch==1.6.0+cu101 -f https://download.pytorch.org/whl/torch_stable.html
# pytorch libraries
import torch # the main pytorch library
import torch.nn as nn # the sub-library containing Softmax, Module and other useful functions
import torch.optim as optim # the sub-library containing the common optimizers (SGD, Adam, etc.)
from torch.utils.data import Dataset, DataLoader, RandomSampler, SequentialSampler
from torch import cuda
device = 'cuda' if cuda.is_available() else 'cpu'
#importing other necessary packages and ClassificationModel for bert
from tqdm import tqdm
import warnings
warnings.simplefilter('ignore')
from scipy.special import softmax非常感谢你的帮助,我真的很感激!
发布于 2022-01-02 13:52:20
这是因为max_seq_length定义了模型的输入神经元数量,从而增加了可训练参数的数量,这将要求它分配更多的内存,这可能会超过您在这些平台上的内存限制。
大多数情况下,max_seq_length在数据集中,有时添加太多可能会浪费培训时间和模型大小。
您可以做的是在您的培训数据集中找到每个样本的最大单词数,并将其用作您的max_seq_length。
https://stackoverflow.com/questions/70556326
复制相似问题