首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >如何用spaCy从句子中删除一个实体?

如何用spaCy从句子中删除一个实体?
EN

Stack Overflow用户
提问于 2020-11-11 02:21:49
回答 1查看 530关注 0票数 1

如何使用语句中删除?我想删除ORP,GPE,货币,普通,或百分比的实体随机。例如,

唐纳德·特朗普·特朗普(

Donald John persondate)是美国统一组织( United StatesGPE )第45位、现任总统。在从政之前,他是一名商人和电视名人。

现在,我如何从这个句子中删除一个特定的实体?在这个例子中,函数选择删除第45,一个序号实体。

代码语言:javascript
复制
>>> sentence = 'Donald John Trump (born June 14, 1946) is the 45th and current president of the United States. Before entering politics, he was a businessman and television personality.'
>>> remove(sentence)
45th
EN

回答 1

Stack Overflow用户

发布于 2020-11-11 05:52:50

请与Spacynp.random.choice一起试用

代码语言:javascript
复制
import spacy
nlp = spacy.load("en_core_web_md")

sentence = 'Donald John Trump (born June 14, 1946) is the 45th and current president of the United States. Before entering politics, he was a businessman and television personality.'
doc = nlp(sentence)

ents = [e.text for e in doc.ents if e.label_ in ("NORP", "GPE", "MONEY", "ORDINAL","PERCENT")]
remove = lambda x: str(np.random.choice(x))
# expected output
remove(ents)
'45th'

如果您希望从句子文本中删除一个随机实体:

代码语言:javascript
复制
def remove_from_sentence(sentence):
    doc = nlp(sentence)
    with doc.retokenize() as retokenizer:
        for e in doc.ents:
            retokenizer.merge(doc[e.start:e.end])
    tok_pairs = [(tok.text, tok.whitespace_) for tok in doc]
    ents = [e.text for e in doc.ents if e.label_ in ("NORP", "GPE", "MONEY", "ORDINAL","PERCENT")]
    ent_to_remove = remove(ents)
    print(ent_to_remove)
    tok_pairs_out = [pair for pair in tok_pairs if pair[0] != ent_to_remove]
    return "".join(np.array(tok_pairs_out).ravel())

remove_from_sentence(sentence)

the United States
'Donald John Trump (born June 14, 1946) is the 45th and current president of . Before entering politics, he was a businessman and television personality.'

如果有什么不清楚,请问一问。

票数 4
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/64779564

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档