文章/答案/技术大牛

发布

社区首页 >问答首页 >NER的Transformer Pipeline使用##s返回部分单词

问NER的Transformer Pipeline使用##s返回部分单词
EN

Stack Overflow用户

提问于 2020-04-09 02:18:07

回答 1查看 986关注 0票数 0

我应该如何解释Transformer NER管道返回的带有“##”的部分单词?其他工具如Flair和SpaCy返回单词和它们的标签。我以前使用过CONLL数据集，从来没有注意到这样的事情。而且，为什么单词被这样划分？

HuggingFace中的示例：

from transformers import pipeline

nlp = pipeline("ner")

sequence = "Hugging Face Inc. is a company based in New York City. Its headquarters are in DUMBO, therefore very" \
           "close to the Manhattan Bridge which is visible from the window."

print(nlp(sequence))

输出：

[
    {'word': 'Hu', 'score': 0.9995632767677307, 'entity': 'I-ORG'},
    {'word': '##gging', 'score': 0.9915938973426819, 'entity': 'I-ORG'},
    {'word': 'Face', 'score': 0.9982671737670898, 'entity': 'I-ORG'},
    {'word': 'Inc', 'score': 0.9994403719902039, 'entity': 'I-ORG'},
    {'word': 'New', 'score': 0.9994346499443054, 'entity': 'I-LOC'},
    {'word': 'York', 'score': 0.9993270635604858, 'entity': 'I-LOC'},
    {'word': 'City', 'score': 0.9993864893913269, 'entity': 'I-LOC'},
    {'word': 'D', 'score': 0.9825621843338013, 'entity': 'I-LOC'},
    {'word': '##UM', 'score': 0.936983048915863, 'entity': 'I-LOC'},
    {'word': '##BO', 'score': 0.8987102508544922, 'entity': 'I-LOC'},
    {'word': 'Manhattan', 'score': 0.9758241176605225, 'entity': 'I-LOC'},
    {'word': 'Bridge', 'score': 0.990249514579773, 'entity': 'I-LOC'}
]

python

pytorch

named-entity-recognition

huggingface-transformers

回答 1

Stack Overflow用户

回答已采纳

发布于 2020-04-09 02:35:43

Pytorch transformers和BERT制作了2个标记，常规单词作为标记，单词+子词作为标记；它们通过单词的基本含义+它们的补语来划分单词，并在开头添加"##“。

假设你有一个短语：I like hugging animals

第一组令牌是：

["I", "like", "hugging", "animals"]

包含子词的第二个列表是：

["I", "like", "hug", "##gging", "animal", "##s"]

你可以在这里了解更多信息：https://www.kaggle.com/funtowiczmo/hugging-face-tutorials-training-tokenizer

票数 2

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/61107371

复制

相似问题

问NER的Transformer Pipeline使用##s返回部分单词
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问NER的Transformer Pipeline使用##s返回部分单词EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问NER的Transformer Pipeline使用##s返回部分单词
EN