文章/答案/技术大牛

发布

社区首页 >问答首页 >当数据帧中的并行处理行时，Stanza (Stanford )不能工作。

问当数据帧中的并行处理行时，Stanza (Stanford )不能工作。
EN

Stack Overflow用户

提问于 2022-04-21 07:52:51

回答 1查看 198关注 0票数 0

我有一个有80万行的dataframe，对于每一行，我想找到每个注释(row.comment)中提到的人。我想使用节，因为它具有更高的精度，并且我用df.iterrows()实现了并行化，以提高执行速度。当我尝试实现节来查找没有多处理的人的名字时，它可以工作，当我尝试做同样的事情时，但是使用SpaCy也会工作，这意味着这个问题与这个包有关。

import stanza
nlp = stanza.Pipeline(lang='en', processors='tokenize, ner') # initialize English neural pipeline
def stanza_function(arg):
    try:
        idx,row = arg
        comment = preprocess_comment(str(row['comment'])) # Retrieve body of the comment
        person_name = ''
        doc = nlp(str(comment))
        persons_mentioned = [word.text for word in doc.ents if word.type == 'PERSON']
        if (len(persons_mentioned) == 1):
            person_name = persons_mentioned[0]
    except:
        print("Error")
        
    return person_name

def spacy_function(arg):
    idx,row = arg
    comment = preprocess_comment(str(row['comment'])) # Retrieve body of the comment
    person_name = ''
    comment_NER = NER(str(comment)) # Implement NER
    persons_mentioned = [word.text for word in comment_NER.ents if word.label_ == 'PERSON']
    print(persons_mentioned)
    if (len(persons_mentioned) == 1):
        person_name = persons_mentioned[0]
    return person_name

pool = mp.Pool(processes=mp.cpu_count())
persons = pool.map(stanza_function, [(idx,row) for idx,row in df.iterrows()])
df['person_name'] = persons

python

dataframe

parallel-processing

stanford-nlp

named-entity-recognition

回答 1

Stack Overflow用户

发布于 2022-06-07 19:09:16

https://github.com/stanfordnlp/stanza/issues/1007

如前所述，无论怎样，MP都不会对节有所帮助，特别是在使用GPU时。

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/71950766

复制

相似问题

问当数据帧中的并行处理行时，Stanza (Stanford )不能工作。
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问当数据帧中的并行处理行时，Stanza (Stanford )不能工作。EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问当数据帧中的并行处理行时，Stanza (Stanford )不能工作。
EN