首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >带正则表达式的空间匹配器

带正则表达式的空间匹配器
EN

Stack Overflow用户
提问于 2021-11-05 15:27:01
回答 1查看 190关注 0票数 1

我有以下几句话:

代码语言:javascript
复制
phrases = ['children externalize their emotions through outward behavior',
         'children externalize hidden emotions.',
         'children externalize internalized emotions.',
         'a child might externalize a hidden emotion through misbehavior',
         'a kid might externalize some emotions through behavior',
         'traumatized children externalize their hidden trauma through bad behavior.',
         'The kid is externalizing internal traumas',
         'A child might externalize emotions though his outward behavior',
         'The kid externalized a lot of his emotions through misbehavior.']

我想捕捉动词外部化之后出现的任何名词;外部化、外部化等等

在这种情况下,我们应该得到:

代码语言:javascript
复制
externalize their emotions
externalize hidden emotions
externalize internalized emotions
externalize a hidden emotion
externalize some emotions
externalize their hidden trauma
externalizing internal traumas
externalized a lot of his emotions

到目前为止,如果名词在动词外部化之后,我只能捕捉它。

我想抓住这个名词,如果它碰巧后面少于15个字符的话。例如:将许多应该匹配的情感具体化;因为(他的很多)只有14个字符;计算空格。

这是我的工作,这是远远不够完美。

代码语言:javascript
复制
import spacy
from spacy.matcher import Matcher
nlp = spacy.load("en_core_web_sm")
matcher =  Matcher(vocab = nlp.vocab)
verb_noun = [{'POS':'VERB'}, {'POS':'NOUN'}]
matcher.add('verb_noun', None, verb_noun)

list_result = []
for phrase in phrases:
    doc = nlp(phrase)
    doc_match = matcher(doc)
    if doc_match:
        for match in doc_match:
            start = match[1]
            end = match[2]
            result = doc[start:end]
            result = [i.lemma_ for i in result]
            if 'externaliz' in result[0].lower():
                result = ' '.join(result)
                list_result.append(result)
EN

回答 1

Stack Overflow用户

发布于 2021-11-07 04:55:24

我想抓住这个名词,如果它碰巧后面少于15个字符的话。例如:将许多应该匹配的情感具体化;因为(他的很多)只有14个字符;计算空格。

你可以这么做,尽管我不推荐。您应该做的是编写一个正则表达式来匹配字符串,并使用Doc.char_span创建一个匹配。由于Matcher在令牌上工作,使用“14个字符,包括空格”这样的启发式方法无法合理地实现。同样,这种启发式也是一种黑客行为,执行起来会不稳定。

我怀疑你真正想要做的是找出什么是外化的,也就是说,找到动词的宾语。在这种情况下,您应该使用DependencyMatcher。下面是一个使用简单规则并合并名词块的示例:

代码语言:javascript
复制
import spacy

from spacy.matcher import DependencyMatcher
nlp = spacy.load("en_core_web_sm")

texts = ['children externalize their emotions through outward behavior',
         'children externalize hidden emotions.',
         'children externalize internalized emotions.',
         'a child might externalize a hidden emotion through misbehavior',
         'a kid might externalize some emotions through behavior',
         'traumatized children externalize their hidden trauma through bad behavior.',
         'The kid is externalizing internal traumas',
         'A child might externalize emotions though his outward behavior',
         'The kid externalized a lot of his emotions through misbehavior.']

pattern = [
  {
    "RIGHT_ID": "externalize",
    "RIGHT_ATTRS": {"LEMMA": "externalize"}
  },
  {
    "LEFT_ID": "externalize",
    "REL_OP": ">",
    "RIGHT_ID": "object",
    "RIGHT_ATTRS": {"DEP": "dobj"}
  },
]

matcher = DependencyMatcher(nlp.vocab)
matcher.add("EXTERNALIZE", [pattern])

# what was externalized?

# this is optional: merge noun phrases
nlp.add_pipe("merge_noun_chunks")

for doc in nlp.pipe(texts):
    for match_id, tokens in  matcher(doc):
        # tokens[0] is like "externalize"
        print(doc[tokens[1]])

输出:

代码语言:javascript
复制
their emotions
hidden emotions
internalized emotions
a hidden emotion
some emotions
their hidden trauma
internal traumas
emotions
his outward behavior
a lot
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/69855625

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档