我试图在我的数据集中实现来自textacy的'extract.subject_verb_object_triples‘函数。但是,我编写的代码非常慢,内存非常密集。是否有更有效的实施?
import spacy
import textacy
def extract_SVO(text):
nlp = spacy.load('en_core_web_sm')
doc = nlp(text)
tuples = textacy.extract.subject_verb_object_triples(doc)
tuples_to_list = list(tuples)
if tuples_to_list != []:
tuples_list.append(tuples_to_list)
tuples_list = []
sp500news['title'].apply(extract_SVO)
print(tuples_list)样本数据(sp500news)
date_publish \
0 2013-05-14 17:17:05
1 2014-05-09 20:15:57
4 2018-07-19 10:29:54
6 2012-04-17 21:02:54
8 2012-12-12 20:17:56
9 2018-11-08 10:51:49
11 2013-08-25 07:13:31
12 2015-01-09 00:54:17
title
0 Italy will not dismantle Montis labour reform minister
1 Exclusive US agency FinCEN rejected veterans in bid to hire lawyers
4 Xis campaign to draw people back to graying rural China faces uphill battle
6 Romney begins to win over conservatives
8 Oregon mall shooting survivor in serious condition
9 Polands PGNiG to sign another deal for LNG supplies from US CEO
11 Australias opposition leader pledges stronger economy if elected PM
12 New York shifts into Code Blue to get homeless off frigid streets 发布于 2018-12-27 13:22:09
这会让它加速一些-
import spacy
import textacy
nlp = spacy.load('en_core_web_sm')
def extract_SVO(text):
tuples = textacy.extract.subject_verb_object_triples(text)
if tuples:
tuples_to_list = list(tuples)
tuples_list.append(tuples_to_list)
tuples_list = []
sp500news['title'] = sp500news['title'].apply(nlp)
_ = sp500news['title'].apply(extract_SVO)
print(tuples_list)解释
在OP实现中,每次从函数内部调用nlp = spacy.load('en_core_web_sm')。我觉得这是最大的瓶颈。这是可以消除的,它应该加快它的速度。
此外,只有在元组不是空的情况下,才能将tuple转换为list。
https://stackoverflow.com/questions/53945672
复制相似问题