文章/答案/技术大牛

发布

社区首页 >问答首页 >将保存的NER重新加载到HuggingFace管道？

问将保存的NER重新加载到HuggingFace管道？
EN

Stack Overflow用户

提问于 2020-09-28 17:18:42

回答 1查看 579关注 0票数 3

我正在研究HuggingFace的迁移学习功能(特别是命名实体识别)。首先，我对变压器架构有点陌生。我简要介绍了他们网站上的例子：

from transformers import pipeline

nlp = pipeline("ner")

sequence = "Hugging Face Inc. is a company based in New York City. Its headquarters are in DUMBO, therefore very" \
       "close to the Manhattan Bridge which is visible from the window."

print(nlp(sequence))

我想要做的是在本地保存和运行这个模型，而不必每次下载"ner“模型(其大小超过1GB)。在他们的文档中，我看到可以使用"pipeline.save_pretrained()“函数将管道保存到本地文件夹。结果是各种文件，我要存储在一个特定的文件夹。

我的问题是如何将这个模型重新加载到脚本中，以便在保存后继续进行分类，如上面的示例所示？"pipeline.save_pretrained()“的输出是多个文件。

以下是我到目前为止尝试过的：

1:以下是关于管道的文档

pipe = transformers.TokenClassificationPipeline(model="pytorch_model.bin", tokenizer='tokenizer_config.json')

我得到的错误是：'str‘对象没有属性"config“

2:以下是关于ner的HuggingFace示例：

from transformers import AutoModelForTokenClassification, AutoTokenizer
import torch

model = AutoModelForTokenClassification.from_pretrained("path to folder following .save_pretrained()")
tokenizer = AutoTokenizer.from_pretrained("path to folder following .save_pretrained()")

label_list = [
"O",       # Outside of a named entity
"B-MISC",  # Beginning of a miscellaneous entity right after another miscellaneous entity
"I-MISC",  # Miscellaneous entity
"B-PER",   # Beginning of a person's name right after another person's name
"I-PER",   # Person's name
"B-ORG",   # Beginning of an organisation right after another organisation
"I-ORG",   # Organisation
"B-LOC",   # Beginning of a location right after another location
"I-LOC"    # Location
]

sequence = "Hugging Face Inc. is a company based in New York City. Its headquarters are in DUMBO, therefore very" \
       "close to the Manhattan Bridge."

# Bit of a hack to get the tokens with the special tokens
tokens = tokenizer.tokenize(tokenizer.decode(tokenizer.encode(sequence)))
inputs = tokenizer.encode(sequence, return_tensors="pt")

outputs = model(inputs)[0]
predictions = torch.argmax(outputs, dim=2)

print([(token, label_list[prediction]) for token, prediction in zip(tokens, predictions[0].tolist())])

这将产生一个错误:列表索引超出范围。

我还试着打印出只是预测，这是没有返回文本格式的令牌及其实体。

任何帮助都将不胜感激！

nlp

named-entity-recognition

huggingface-transformers

huggingface-tokenizers

回答 1

Stack Overflow用户

回答已采纳

发布于 2022-10-04 09:23:08

加载这样的模型对我来说总是有效的：

from transformers import pipeline

pipe = pipeline('token-classification', model=model_folder, tokenizer=model_folder)

有关如何使用管道的进一步示例，请查看这里。

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/64106747

复制

相似问题

问将保存的NER重新加载到HuggingFace管道？
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问将保存的NER重新加载到HuggingFace管道？EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问将保存的NER重新加载到HuggingFace管道？
EN