我希望使用spaCy的依赖解析器来确定文档中的否定范围。请参见这里,该依赖关系可视化器应用于以下字符串:
RT @trader $AAPL 2012 is ooopen to Talk about patents with GOOG definitely not the treatment Samsung got heh someURL我可以用
negation_tokens = [tok for tok in doc if tok.dep_ == 'neg']因此,我看到,在我的字符串中,got的否定修饰符不是。现在,我想用以下内容来定义否定的范围:
negation_head_tokens = [token.head for token in negation_tokens]
for token in negation_head_tokens:
end = token.i
start = token.head.i + 1
negated_tokens = doc[start:end]
print(negated_tokens)这提供了以下输出:
ooopen to Talk about patents with GOOG definitely not the treatment Samsung现在我已经定义了作用域,我想在某些以POS-标记为条件的单词中添加“Now”。
list = ['ADJ', 'ADV', 'AUX', 'VERB']
for token in negated_tokens:
for i in list:
if token.pos_ == i:
print('not'+token.text)这说明了以下几点:
notooopen, notTalk, notdefinitely, notnot我不想从输出和返回中排除
RT @trader $AAPL 2012 is notooopen to notTalk about patents with GOOG notdefinitely the treatment Samsung got heh someurl我怎样才能做到这一点?从速度角度看我的脚本有改进吗?
完整脚本:
import spacy
nlp = spacy.load("en_core_web_sm")
doc = nlp(u'RT @trader $AAPL 2012 is ooopen to Talk about patents with GOOG definitely not the treatment Samsung got heh someURL)
list = ['ADJ', 'ADV', 'AUX', 'VERB']
negation_tokens = [tok for tok in doc if tok.dep_ == 'neg']
negation_head_tokens = [token.head for token in negation_tokens]
for token in negation_head_tokens:
end = token.i
start = token.head.i + 1
negated_tokens = doc[start:end]
for token in negated_tokens:
for i in list:
if token.pos_ == i:
print('not'+token.text)发布于 2019-03-03 16:05:23
list这样的Python内置程序是很糟糕的--我把它改名为pos_list。代码:
doc = nlp(u'RT @trader $AAPL 2012 is ooopen to Talk about patents with GOOG definitely not the treatment Samsung got heh someURL')
pos_list = ['ADJ', 'ADV', 'AUX', 'VERB']
negation_tokens = [tok for tok in doc if tok.dep_ == 'neg']
blacklist = [token.text for token in negation_tokens]
negation_head_tokens = [token.head for token in negation_tokens]
new_doc = []
for token in negation_head_tokens:
end = token.i
start = token.head.i + 1
left, right = doc[:start], doc[:end]
negated_tokens = doc[start:end]
for token in doc:
if token in negated_tokens:
if token.pos_ in pos_list and token.text not in blacklist:
# or you can leave out the blacklist and put it here directly
# if token.pos_ in pos_list and token.text not in [token.text for token in negation_tokens]:
new_doc.append('not'+token.text)
continue
else:
pass
new_doc.append(token.text)
print(' '.join(new_doc))
> RT @trader $ AAPL 2012 is notooopen to notTalk about patents with GOOG notdefinitely not the treatment Samsung got heh someURLhttps://stackoverflow.com/questions/54970683
复制相似问题