文章/答案/技术大牛

发布

社区首页 >问答首页 >删除否定标记并返回空间中的否定句

问删除否定标记并返回空间中的否定句
EN

Stack Overflow用户

提问于 2019-03-03 15:53:42

回答 1查看 1.1K关注 0票数 1

我希望使用spaCy的依赖解析器来确定文档中的否定范围。请参见这里，该依赖关系可视化器应用于以下字符串：

RT @trader $AAPL 2012 is ooopen to Talk about patents with GOOG definitely not the treatment Samsung got heh someURL

我可以用

 negation_tokens = [tok for tok in doc if tok.dep_ == 'neg']

因此，我看到，在我的字符串中，got的否定修饰符不是。现在，我想用以下内容来定义否定的范围：

negation_head_tokens = [token.head for token in negation_tokens]   
for token in negation_head_tokens:
    end = token.i
    start = token.head.i + 1
    negated_tokens = doc[start:end]
    print(negated_tokens)

这提供了以下输出：

 ooopen to Talk about patents with GOOG definitely not the treatment Samsung

现在我已经定义了作用域，我想在某些以POS-标记为条件的单词中添加“Now”。

list = ['ADJ', 'ADV', 'AUX', 'VERB']
for token in negated_tokens:
    for i in list:
        if token.pos_ == i:
            print('not'+token.text)

这说明了以下几点：

 notooopen, notTalk, notdefinitely, notnot

我不想从输出和返回中排除

RT @trader $AAPL 2012 is notooopen to notTalk about patents with GOOG notdefinitely the treatment Samsung got heh someurl

我怎样才能做到这一点？从速度角度看我的脚本有改进吗？

完整脚本：

import spacy

nlp = spacy.load("en_core_web_sm")
doc = nlp(u'RT @trader $AAPL 2012 is ooopen to Talk about patents with GOOG definitely not the treatment Samsung got heh someURL)
list = ['ADJ', 'ADV', 'AUX', 'VERB']

negation_tokens = [tok for tok in doc if tok.dep_ == 'neg']
negation_head_tokens = [token.head for token in negation_tokens]

for token in negation_head_tokens:
   end = token.i
   start = token.head.i + 1
   negated_tokens = doc[start:end]
   for token in negated_tokens:
      for i in list:
         if token.pos_ == i:
            print('not'+token.text)

python

spacy

回答 1

Stack Overflow用户

回答已采纳

发布于 2019-03-03 16:05:23

重写像list这样的Python内置程序是很糟糕的--我把它改名为pos_list。
因为"not“只是一个普通的副词，它似乎是避免它的最简单的方法是使用一个明确的黑名单。也许有一种更“语言”的方法来做这件事。
我稍微加速了你的内环。

代码：

doc = nlp(u'RT @trader $AAPL 2012 is ooopen to Talk about patents with GOOG definitely not the treatment Samsung got heh someURL')

pos_list = ['ADJ', 'ADV', 'AUX', 'VERB']
negation_tokens = [tok for tok in doc if tok.dep_ == 'neg']
blacklist = [token.text for token in negation_tokens]
negation_head_tokens = [token.head for token in negation_tokens]
new_doc = []

for token in negation_head_tokens:
    end = token.i
    start = token.head.i + 1
    left, right = doc[:start], doc[:end] 
    negated_tokens = doc[start:end]
for token in doc:
    if token in negated_tokens:
        if token.pos_ in pos_list and token.text not in blacklist:

        # or you can leave out the blacklist and put it here directly
        # if token.pos_ in pos_list and token.text not in [token.text for token in negation_tokens]:
            new_doc.append('not'+token.text)
            continue
        else:
            pass
    new_doc.append(token.text)
print(' '.join(new_doc))

> RT @trader $ AAPL 2012 is notooopen to notTalk about patents with GOOG notdefinitely not the treatment Samsung got heh someURL

票数 2

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/54970683

复制

相似问题

问删除否定标记并返回空间中的否定句
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问删除否定标记并返回空间中的否定句EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问删除否定标记并返回空间中的否定句
EN