首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >更有效地实现Textacy / spacy 'subject_verb_object_triples‘

更有效地实现Textacy / spacy 'subject_verb_object_triples‘
EN

Stack Overflow用户
提问于 2018-12-27 13:11:42
回答 1查看 1.9K关注 0票数 3

我试图在我的数据集中实现来自textacy的'extract.subject_verb_object_triples‘函数。但是,我编写的代码非常慢,内存非常密集。是否有更有效的实施?

代码语言:javascript
复制
import spacy
import textacy

def extract_SVO(text):

    nlp = spacy.load('en_core_web_sm')
    doc = nlp(text)
    tuples = textacy.extract.subject_verb_object_triples(doc)
    tuples_to_list = list(tuples)
    if tuples_to_list != []:
        tuples_list.append(tuples_to_list)

tuples_list = []          
sp500news['title'].apply(extract_SVO)
print(tuples_list)

样本数据(sp500news)

代码语言:javascript
复制
    date_publish  \
0       2013-05-14 17:17:05   
1       2014-05-09 20:15:57   
4       2018-07-19 10:29:54   
6       2012-04-17 21:02:54   
8       2012-12-12 20:17:56   
9       2018-11-08 10:51:49   
11      2013-08-25 07:13:31   
12      2015-01-09 00:54:17   

 title  
0       Italy will not dismantle Montis labour reform  minister                            
1       Exclusive US agency FinCEN rejected veterans in bid to hire lawyers                
4       Xis campaign to draw people back to graying rural China faces uphill battle        
6       Romney begins to win over conservatives                                            
8       Oregon mall shooting survivor in serious condition                                 
9       Polands PGNiG to sign another deal for LNG supplies from US CEO                    
11      Australias opposition leader pledges stronger economy if elected PM                
12      New York shifts into Code Blue to get homeless off frigid streets                  
EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2018-12-27 13:22:09

这会让它加速一些-

代码语言:javascript
复制
import spacy
import textacy
nlp = spacy.load('en_core_web_sm')
def extract_SVO(text):
    tuples = textacy.extract.subject_verb_object_triples(text)
    if tuples:
        tuples_to_list = list(tuples)
        tuples_list.append(tuples_to_list)

tuples_list = []          
sp500news['title'] = sp500news['title'].apply(nlp)
_ = sp500news['title'].apply(extract_SVO)
print(tuples_list)

解释

在OP实现中,每次从函数内部调用nlp = spacy.load('en_core_web_sm')。我觉得这是最大的瓶颈。这是可以消除的,它应该加快它的速度。

此外,只有在元组不是空的情况下,才能将tuple转换为list

票数 5
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/53945672

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档