文章/答案/技术大牛

发布

社区首页 >问答首页 >SimpleTransformers "max_seq_length“论点导致CUDA在Kaggle和Google中出现内存错误

问SimpleTransformers "max_seq_length“论点导致CUDA在Kaggle和Google中出现内存错误
EN

Stack Overflow用户

提问于 2022-01-02 13:23:04

回答 1查看 495关注 0票数 0

当基于sloBERTa的CamemBERT微调多类分类任务的SimpleTransformers模型时，我想使用模型参数"max_seq_length"：512，因为以前的工作表明它提供了比128个更好的结果，但是包含这个参数会引发下面的错误。这个错误在Kaggle和Google环境中是一样的，终止执行和重新运行没有帮助。无论培训时代的数量有多小，都会触发错误，而且数据集只包含600个实例(文本作为字符串，标签为整数)。我已经尝试将max_seq_length降低到509、500和128，但是错误仍然存在。

没有这个参数的设置正常工作，允许90个时期的训练，否则我就有足够的内存。

from simpletransformers.classification import ClassificationModel

# define hyperparameter
model_args ={"overwrite_output_dir": True,
             "num_train_epochs": 90,
             "labels_list": LABELS_NUM,
             "learning_rate": 1e-5,
             "train_batch_size": 32,
             "no_cache": True,
             "no_save": True,
             #"max_seq_length": 512,
             "save_steps": -1,
             }

model = ClassificationModel(
    "camembert", "EMBEDDIA/sloberta",
    use_cuda = device,
    num_labels = NUM_LABELS,
    args = model_args)

model.train_model(train_df)

这是一个错误：

RuntimeError                              Traceback (most recent call last)
/tmp/ipykernel_34/2529369927.py in <module>
    19     args = model_args)
    20 
---> 21 model.train_model(train_df)

/opt/conda/lib/python3.7/site-packages/simpletransformers/classification/classification_model.py in train_model(self, train_df, multi_label, output_dir, show_running_loss, args, eval_df, verbose, **kwargs)
   610             eval_df=eval_df,
   611             verbose=verbose,
--> 612             **kwargs,
   613         )
   614 

/opt/conda/lib/python3.7/site-packages/simpletransformers/classification/classification_model.py in train(self, train_dataloader, output_dir, multi_label, show_running_loss, eval_df, test_df, verbose, **kwargs)
   883                             loss_fct=self.loss_fct,
   884                             num_labels=self.num_labels,
--> 885                             args=self.args,
   886                         )
   887                 else:

/opt/conda/lib/python3.7/site-packages/simpletransformers/classification/classification_model.py in _calculate_loss(self, model, inputs, loss_fct, num_labels, args)
  2256 
  2257     def _calculate_loss(self, model, inputs, loss_fct, num_labels, args):
-> 2258         outputs = model(**inputs)
  2259         # model outputs are always tuple in pytorch-transformers (see doc)
  2260         loss = outputs[0]

/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   720             result = self._slow_forward(*input, **kwargs)
   721         else:
--> 722             result = self.forward(*input, **kwargs)
   723         for hook in itertools.chain(
   724                 _global_forward_hooks.values(),

/opt/conda/lib/python3.7/site-packages/transformers/models/roberta/modeling_roberta.py in forward(self, input_ids, attention_mask, token_type_ids, position_ids, head_mask, inputs_embeds, labels, output_attentions, output_hidden_states, return_dict)
  1210             output_attentions=output_attentions,
  1211             output_hidden_states=output_hidden_states,
-> 1212             return_dict=return_dict,
  1213         )
  1214         sequence_output = outputs[0]

/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   720             result = self._slow_forward(*input, **kwargs)
   721         else:
--> 722             result = self.forward(*input, **kwargs)
   723         for hook in itertools.chain(
   724                 _global_forward_hooks.values(),

/opt/conda/lib/python3.7/site-packages/transformers/models/roberta/modeling_roberta.py in forward(self, input_ids, attention_mask, token_type_ids, position_ids, head_mask, inputs_embeds, encoder_hidden_states, encoder_attention_mask, past_key_values, use_cache, output_attentions, output_hidden_states, return_dict)
   859             output_attentions=output_attentions,
   860             output_hidden_states=output_hidden_states,
--> 861             return_dict=return_dict,
   862         )
   863         sequence_output = encoder_outputs[0]

/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   720             result = self._slow_forward(*input, **kwargs)
   721         else:
--> 722             result = self.forward(*input, **kwargs)
   723         for hook in itertools.chain(
   724                 _global_forward_hooks.values(),

/opt/conda/lib/python3.7/site-packages/transformers/models/roberta/modeling_roberta.py in forward(self, hidden_states, attention_mask, head_mask, encoder_hidden_states, encoder_attention_mask, past_key_values, use_cache, output_attentions, output_hidden_states, return_dict)
   531                     encoder_attention_mask,
   532                     past_key_value,
--> 533                     output_attentions,
   534                 )
   535 

/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   720             result = self._slow_forward(*input, **kwargs)
   721         else:
--> 722             result = self.forward(*input, **kwargs)
   723         for hook in itertools.chain(
   724                 _global_forward_hooks.values(),

/opt/conda/lib/python3.7/site-packages/transformers/models/roberta/modeling_roberta.py in forward(self, hidden_states, attention_mask, head_mask, encoder_hidden_states, encoder_attention_mask, past_key_value, output_attentions)
   415             head_mask,
   416             output_attentions=output_attentions,
--> 417             past_key_value=self_attn_past_key_value,
   418         )
   419         attention_output = self_attention_outputs[0]

/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   720             result = self._slow_forward(*input, **kwargs)
   721         else:
--> 722             result = self.forward(*input, **kwargs)
   723         for hook in itertools.chain(
   724                 _global_forward_hooks.values(),

/opt/conda/lib/python3.7/site-packages/transformers/models/roberta/modeling_roberta.py in forward(self, hidden_states, attention_mask, head_mask, encoder_hidden_states, encoder_attention_mask, past_key_value, output_attentions)
   344             encoder_attention_mask,
   345             past_key_value,
--> 346             output_attentions,
   347         )
   348         attention_output = self.output(self_outputs[0], hidden_states)

/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   720             result = self._slow_forward(*input, **kwargs)
   721         else:
--> 722             result = self.forward(*input, **kwargs)
   723         for hook in itertools.chain(
   724                 _global_forward_hooks.values(),

/opt/conda/lib/python3.7/site-packages/transformers/models/roberta/modeling_roberta.py in forward(self, hidden_states, attention_mask, head_mask, encoder_hidden_states, encoder_attention_mask, past_key_value, output_attentions)
   273             attention_probs = attention_probs * head_mask
   274 
--> 275         context_layer = torch.matmul(attention_probs, value_layer)
   276 
   277         context_layer = context_layer.permute(0, 2, 1, 3).contiguous()

RuntimeError: CUDA out of memory. Tried to allocate 192.00 MiB (GPU 0; 15.90 GiB total capacity; 15.04 GiB already allocated; 15.75 MiB free; 15.12 GiB reserved in total by PyTorch)

另外的代码(如果有帮助的话--我已经尝试了所有关于我在网络上找到的pytorch的东西--完整的代码可以在https://www.kaggle.com/tajakuz/0-sloberta-example-max-seq-length-error上访问)：

!conda install --yes pytorch>=1.6 cudatoolkit=11.0 -c pytorch

# install simpletransformers
!pip install -q transformers
!pip install --upgrade transformers
!pip install -q simpletransformers

# check installed version
!pip freeze | grep simpletransformers

!pip uninstall -q torch -y
!pip install -q torch==1.6.0+cu101 -f https://download.pytorch.org/whl/torch_stable.html

# pytorch libraries
import torch # the main pytorch library
import torch.nn as nn # the sub-library containing Softmax, Module and other useful functions
import torch.optim as optim # the sub-library containing the common optimizers (SGD, Adam, etc.)
from torch.utils.data import Dataset, DataLoader, RandomSampler, SequentialSampler

from torch import cuda
device = 'cuda' if cuda.is_available() else 'cpu'

#importing other necessary packages and ClassificationModel for bert
from tqdm import tqdm
import warnings
warnings.simplefilter('ignore')

from scipy.special import softmax

非常感谢你的帮助，我真的很感激！

pytorch

kaggle

transformer-model

simpletransformers

回答 1

Stack Overflow用户

回答已采纳

发布于 2022-01-02 13:52:20

这是因为max_seq_length定义了模型的输入神经元数量，从而增加了可训练参数的数量，这将要求它分配更多的内存，这可能会超过您在这些平台上的内存限制。

大多数情况下，max_seq_length在数据集中，有时添加太多可能会浪费培训时间和模型大小。

您可以做的是在您的培训数据集中找到每个样本的最大单词数，并将其用作您的max_seq_length。

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/70556326

复制

相似问题

问SimpleTransformers "max_seq_length“论点导致CUDA在Kaggle和Google中出现内存错误
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问SimpleTransformers "max_seq_length“论点导致CUDA在Kaggle和Google中出现内存错误EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问SimpleTransformers "max_seq_length“论点导致CUDA在Kaggle和Google中出现内存错误
EN