我目前使用拥抱管道进行情感分析,如下所示:
from transformers import pipeline
classifier = pipeline('sentiment-analysis', device=0)问题是,当我传递大于512标记的文本时,它只会崩溃,说明输入太长。是否有任何方法将max_length和截断参数从令牌程序直接传递给管道?
我的工作是:
从变压器进口AutoTokenizer,AutoModelForSequenceClassification
model_name = "nlptown/bert-base-multilingual-uncased-sentiment"
model = AutoModelForSequenceClassification.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
classifier = pipeline('sentiment-analysis', model=model, tokenizer=tokenizer, device=0)然后当我调用令牌程序时:
pt_batch = tokenizer(text, padding=True, truncation=True, max_length=512, return_tensors="pt")但是,如果能够像这样直接调用管道就好得多了:
classifier(text, padding=True, truncation=True, max_length=512)发布于 2021-08-01 11:00:02
这种方式应该有效:
classifier(text, padding=True, truncation=True)如果它不尝试将令牌程序加载为:
tokenizer = AutoTokenizer.from_pretrained(model_name, model_max_len=512)发布于 2022-01-16 12:03:01
您可以在推断时使用tokenizer_kwargs:
model_pipline = pipeline("text-classification",model=model,tokenizer=tokenizer,device=0, return_all_scores=True)
tokenizer_kwargs = {'padding':True,'truncation':True,'max_length':512,'return_tensors':'pt'}
prediction = model_pipeline('sample text to predict',**tokenizer_kwargs)有关更多详细信息,您可以查看此链接
https://stackoverflow.com/questions/67849833
复制相似问题