文章/答案/技术大牛

发布

社区首页 >问答首页 >Spacy ent.label_无法定义组织

问Spacy ent.label_无法定义组织
EN

Stack Overflow用户

提问于 2020-03-09 08:15:40

回答 2查看 199关注 0票数 0

我正在使用spacy来分析恐怖分子，奇怪的是spacy找不到像fatah这样的组织。代码如下

import spacy
nlp = spacy.load('en')
def read_file_to_list(file_name):
    with open(file_name, 'r') as file:
        return file.readlines()
terrorism_articles = read_file_to_list('data/rand-terrorism-dataset.txt')
terrorism_articles_nlp = [nlp(art) for art in terrorism_articles]
common_terrorist_groups = [
    'taliban', 
    'al - qaeda', 
    'hamas',  
    'fatah', 
    'plo', 
    'bilad al - rafidayn'
]

common_locations = [
    'iraq',
    'baghdad', 
    'kirkuk', 
    'mosul', 
    'afghanistan', 
    'kabul',
    'basra', 
    'palestine', 
    'gaza', 
    'israel', 
    'istanbul', 
    'beirut', 
    'pakistan'
]
location_entity_dict = defaultdict(Counter)

for article in terrorism_articles_nlp:
    
    article_terrorist_groups = [ent.lemma_ for ent in article.ents if ent.label_=='PERSON' or ent.label_ =='ORG']#人或者组织
    article_locations = [ent.lemma_ for ent in article.ents if ent.label_=='GPE']
    terrorist_common = [ent for ent in article_terrorist_groups if ent in common_terrorist_groups]
    locations_common = [ent for ent in article_locations if ent in common_locations]
    
    for found_entity in terrorist_common:
        for found_location in locations_common:
            location_entity_dict[found_entity][found_location] += 1
location_entity_dict

我只是从文件中什么也得不到。这是The text data link

谢谢!

nlp

spacy

回答 2

Stack Overflow用户

发布于 2020-03-09 10:31:08

我重现了你的例子，看起来你会得到article_terrorist_groups和terrorist_common的空列表。因此，您不会得到所需的输出(我假设是这样)。我将模型(针对我的机器)更改为en_core_web_sm，并且我注意到ent.label与您在列表理解中的if语句中指定的不同。我几乎可以肯定，无论您使用的是spacy.load('en')还是spacy.load('en_core_web_sm')，情况都是如此。

您正在使用if ent.label_=='PERSON' or ent.label_ =='ORG'，这将导致空列表。您需要更改此设置才能使其正常工作。基本上，在对article_terrorist_groups和terrorist_common的列表理解中，for循环试图遍历一个空列表。

如果您查看我发布的输出，您将看到ent.label既不是'PERSON'也不是'ORG'

注意:我建议在代码中添加print语句(或使用调试器)，以便不时地进行检查。

我的代码

import spacy
from collections import defaultdict, Counter
nlp = spacy.load('en_core_web_sm') # I changed this
def read_file_to_list(file_name):
    with open(file_name, 'r') as file:
        return file.readlines()

terrorism_articles = read_file_to_list('rand-terrorism-dataset.txt')
terrorism_articles_nlp = [nlp(art) for art in terrorism_articles]
common_terrorist_groups = [
    'taliban', 
    'al - qaeda', 
    'hamas',  
    'fatah', 
    'plo', 
    'bilad al - rafidayn'
]

common_locations = [
    'iraq',
    'baghdad', 
    'kirkuk', 
    'mosul', 
    'afghanistan', 
    'kabul',
    'basra', 
    'palestine', 
    'gaza', 
    'israel', 
    'istanbul', 
    'beirut', 
    'pakistan'
]
location_entity_dict = defaultdict(Counter)


for article in terrorism_articles_nlp:
    print([(ent.lemma_, ent.label) for ent in article.ents])

输出

[('CHILE', 383), ('the Santiago Binational Center', 383), ('21,000', 394)]
[('ISRAEL', 384), ('palestinian', 381), ('five', 397), ('Masada', 384)]
[('GUATEMALA', 383), ('U.S. Marines', 381), ('Guatemala City', 384)]

考虑到此答案的长度，截断输出

票数 1

Stack Overflow用户

发布于 2020-07-29 10:40:57

因为common_terrorist_groups和common_locations中的组和位置是小写的，而查找到的数据中的terrorist_common和locations_common是大写的。因此，只需将代码if ent in common_terrorist_groups更改为if ent.lower() in common_terrorist_groups

common_terrorist_groups = [
    'taliban', 
    'al - qaeda', 
    'hamas',  
    'fatah', 
    'plo', 
    'bilad al - rafidayn'
]

common_locations = [
    'iraq',
    'baghdad', 
    'kirkuk', 
    'mosul', 
    'afghanistan', 
    'kabul',
    'basra', 
    'palestine', 
    'gaza', 
    'israel', 
    'istanbul', 
    'beirut', 
    'pakistan'
]
location_entity_dict = defaultdict(Counter)

for article in terrorism_articles_nlp:

    article_terrorist_cands = [ent.lemma_ for ent in article.ents if ent.label_ == 'PERSON' or ent.label_ == 'ORG']
    article_location_cands = [ent.lemma_ for ent in article.ents if ent.label_ == 'GPE']

    terrorist_candidates = [ent for ent in article_terrorist_cands if ent.lower() in common_terrorist_groups]
    location_candidates = [loc for loc in article_location_cands if loc.lower() in common_locations]
    for found_entity in terrorist_candidates:
        for found_location in location_candidates:
            location_entity_dict[found_entity][found_location] += 1

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/60593307

复制

相似问题

问Spacy ent.label_无法定义组织
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Spacy ent.label_无法定义组织EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Spacy ent.label_无法定义组织
EN