文章/答案/技术大牛

发布

社区首页 >问答首页 >用Huggingface变压器识别命名实体，映射回完整实体

问用Huggingface变压器识别命名实体，映射回完整实体
EN

Stack Overflow用户

提问于 2020-08-02 23:20:14

回答 2查看 5K关注 0票数 10

我正在查看命名实体识别的Huggingface流水线的文档，我不清楚这些结果是如何用于实体识别模型的。

例如，给定文档中的示例：

>>> from transformers import pipeline

>>> nlp = pipeline("ner")

>>> sequence = "Hugging Face Inc. is a company based in New York City. Its headquarters are in DUMBO, therefore very"
...            "close to the Manhattan Bridge which is visible from the window."

This outputs a list of all words that have been identified as an entity from the 9 classes     defined above. Here is the expected results:

print(nlp(sequence))

[
{'word': 'Hu', 'score': 0.9995632767677307, 'entity': 'I-ORG'},
{'word': '##gging', 'score': 0.9915938973426819, 'entity': 'I-ORG'},
{'word': 'Face', 'score': 0.9982671737670898, 'entity': 'I-ORG'},
{'word': 'Inc', 'score': 0.9994403719902039, 'entity': 'I-ORG'},
{'word': 'New', 'score': 0.9994346499443054, 'entity': 'I-LOC'},
{'word': 'York', 'score': 0.9993270635604858, 'entity': 'I-LOC'},
{'word': 'City', 'score': 0.9993864893913269, 'entity': 'I-LOC'},
{'word': 'D', 'score': 0.9825621843338013, 'entity': 'I-LOC'},
{'word': '##UM', 'score': 0.936983048915863, 'entity': 'I-LOC'},
{'word': '##BO', 'score': 0.8987102508544922, 'entity': 'I-LOC'},
{'word': 'Manhattan', 'score': 0.9758241176605225, 'entity': 'I-LOC'},
{'word': 'Bridge', 'score': 0.990249514579773, 'entity': 'I-LOC'}
]

虽然这一点令人印象深刻，但我不清楚从以下方面获得"DUMBO“的正确方法：

{'word': 'D', 'score': 0.9825621843338013, 'entity': 'I-LOC'},
{'word': '##UM', 'score': 0.936983048915863, 'entity': 'I-LOC'},
{'word': '##BO', 'score': 0.8987102508544922, 'entity': 'I-LOC'},

-甚至是更清洁的多个标志匹配，比如区分“纽约市”和“纽约市”。

虽然我可以想象启发式的方法，但是在输入的情况下，将这些标记重新加入到正确的标签中的正确方法是什么呢？

huggingface-transformers

回答 2

Stack Overflow用户

回答已采纳

发布于 2020-08-03 15:26:54

当您设置参数时，管道对象可以为您做到这一点：

变压器< 4.7.0：实体到True。
变压器>= 4.7.0：策略 to simple

from transformers import pipeline

#transformers < 4.7.0
#ner = pipeline("ner", grouped_entities=True)

ner = pipeline("ner", aggregation_strategy='simple')

sequence = "Hugging Face Inc. is a company based in New York City. Its headquarters are in DUMBO, therefore very close to the Manhattan Bridge which is visible from the window."

output = ner(sequence)

print(output)

输出：

[{'entity_group': 'I-ORG', 'score': 0.9970663785934448, 'word': 'Hugging Face Inc'}
, {'entity_group': 'I-LOC', 'score': 0.9993778467178345, 'word': 'New York City'}
, {'entity_group': 'I-LOC', 'score': 0.9571147759755453, 'word': 'DUMBO'}
, {'entity_group': 'I-LOC', 'score': 0.9838141202926636, 'word': 'Manhattan Bridge'}
, {'entity_group': 'I-LOC', 'score': 0.9838141202926636, 'word': 'Manhattan Bridge'}]

票数 13

Stack Overflow用户

发布于 2022-04-01 15:06:38

快速更新：grouped_entities已被废弃。

UserWarning：grouped_entities被弃用，将在版本5.0.0中删除，而默认为aggregation_strategy="AggregationStrategy.SIMPLE"。不建议使用f'grouped_entities，将在版本5.0.0中删除，默认为aggregation_strategy="{aggregation_strategy}"。

您必须将代码更改为：

ner = pipeline("ner", aggregation_stategy="simple")

票数 2

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/63221913

复制

相似问题

问用Huggingface变压器识别命名实体，映射回完整实体
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问用Huggingface变压器识别命名实体，映射回完整实体EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问用Huggingface变压器识别命名实体，映射回完整实体
EN