文章/答案/技术大牛

发布

社区首页 >问答首页 >带正则表达式的空间匹配器

问带正则表达式的空间匹配器
EN

Stack Overflow用户

提问于 2021-11-05 15:27:01

回答 1查看 190关注 0票数 1

我有以下几句话：

phrases = ['children externalize their emotions through outward behavior',
         'children externalize hidden emotions.',
         'children externalize internalized emotions.',
         'a child might externalize a hidden emotion through misbehavior',
         'a kid might externalize some emotions through behavior',
         'traumatized children externalize their hidden trauma through bad behavior.',
         'The kid is externalizing internal traumas',
         'A child might externalize emotions though his outward behavior',
         'The kid externalized a lot of his emotions through misbehavior.']

我想捕捉动词外部化之后出现的任何名词；外部化、外部化等等

在这种情况下，我们应该得到：

externalize their emotions
externalize hidden emotions
externalize internalized emotions
externalize a hidden emotion
externalize some emotions
externalize their hidden trauma
externalizing internal traumas
externalized a lot of his emotions

到目前为止，如果名词在动词外部化之后，我只能捕捉它。

我想抓住这个名词，如果它碰巧后面少于15个字符的话。例如:将许多应该匹配的情感具体化；因为(他的很多)只有14个字符；计算空格。

这是我的工作，这是远远不够完美。

import spacy
from spacy.matcher import Matcher
nlp = spacy.load("en_core_web_sm")
matcher =  Matcher(vocab = nlp.vocab)
verb_noun = [{'POS':'VERB'}, {'POS':'NOUN'}]
matcher.add('verb_noun', None, verb_noun)

list_result = []
for phrase in phrases:
    doc = nlp(phrase)
    doc_match = matcher(doc)
    if doc_match:
        for match in doc_match:
            start = match[1]
            end = match[2]
            result = doc[start:end]
            result = [i.lemma_ for i in result]
            if 'externaliz' in result[0].lower():
                result = ' '.join(result)
                list_result.append(result)

python

nlp

spacy

matcher

回答 1

Stack Overflow用户

发布于 2021-11-07 04:55:24

我想抓住这个名词，如果它碰巧后面少于15个字符的话。例如:将许多应该匹配的情感具体化；因为(他的很多)只有14个字符；计算空格。

你可以这么做，尽管我不推荐。您应该做的是编写一个正则表达式来匹配字符串，并使用Doc.char_span创建一个匹配。由于Matcher在令牌上工作，使用“14个字符，包括空格”这样的启发式方法无法合理地实现。同样，这种启发式也是一种黑客行为，执行起来会不稳定。

我怀疑你真正想要做的是找出什么是外化的，也就是说，找到动词的宾语。在这种情况下，您应该使用DependencyMatcher。下面是一个使用简单规则并合并名词块的示例：

import spacy

from spacy.matcher import DependencyMatcher
nlp = spacy.load("en_core_web_sm")

texts = ['children externalize their emotions through outward behavior',
         'children externalize hidden emotions.',
         'children externalize internalized emotions.',
         'a child might externalize a hidden emotion through misbehavior',
         'a kid might externalize some emotions through behavior',
         'traumatized children externalize their hidden trauma through bad behavior.',
         'The kid is externalizing internal traumas',
         'A child might externalize emotions though his outward behavior',
         'The kid externalized a lot of his emotions through misbehavior.']

pattern = [
  {
    "RIGHT_ID": "externalize",
    "RIGHT_ATTRS": {"LEMMA": "externalize"}
  },
  {
    "LEFT_ID": "externalize",
    "REL_OP": ">",
    "RIGHT_ID": "object",
    "RIGHT_ATTRS": {"DEP": "dobj"}
  },
]

matcher = DependencyMatcher(nlp.vocab)
matcher.add("EXTERNALIZE", [pattern])

# what was externalized?

# this is optional: merge noun phrases
nlp.add_pipe("merge_noun_chunks")

for doc in nlp.pipe(texts):
    for match_id, tokens in  matcher(doc):
        # tokens[0] is like "externalize"
        print(doc[tokens[1]])

输出：

their emotions
hidden emotions
internalized emotions
a hidden emotion
some emotions
their hidden trauma
internal traumas
emotions
his outward behavior
a lot

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/69855625

复制

相似问题

问带正则表达式的空间匹配器
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问带正则表达式的空间匹配器EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问带正则表达式的空间匹配器
EN