我有以下几句话:
phrases = ['children externalize their emotions through outward behavior',
'children externalize hidden emotions.',
'children externalize internalized emotions.',
'a child might externalize a hidden emotion through misbehavior',
'a kid might externalize some emotions through behavior',
'traumatized children externalize their hidden trauma through bad behavior.',
'The kid is externalizing internal traumas',
'A child might externalize emotions though his outward behavior',
'The kid externalized a lot of his emotions through misbehavior.']我想捕捉动词外部化之后出现的任何名词;外部化、外部化等等
在这种情况下,我们应该得到:
externalize their emotions
externalize hidden emotions
externalize internalized emotions
externalize a hidden emotion
externalize some emotions
externalize their hidden trauma
externalizing internal traumas
externalized a lot of his emotions到目前为止,如果名词在动词外部化之后,我只能捕捉它。
我想抓住这个名词,如果它碰巧后面少于15个字符的话。例如:将许多应该匹配的情感具体化;因为(他的很多)只有14个字符;计算空格。
这是我的工作,这是远远不够完美。
import spacy
from spacy.matcher import Matcher
nlp = spacy.load("en_core_web_sm")
matcher = Matcher(vocab = nlp.vocab)
verb_noun = [{'POS':'VERB'}, {'POS':'NOUN'}]
matcher.add('verb_noun', None, verb_noun)
list_result = []
for phrase in phrases:
doc = nlp(phrase)
doc_match = matcher(doc)
if doc_match:
for match in doc_match:
start = match[1]
end = match[2]
result = doc[start:end]
result = [i.lemma_ for i in result]
if 'externaliz' in result[0].lower():
result = ' '.join(result)
list_result.append(result)发布于 2021-11-07 04:55:24
我想抓住这个名词,如果它碰巧后面少于15个字符的话。例如:将许多应该匹配的情感具体化;因为(他的很多)只有14个字符;计算空格。
你可以这么做,尽管我不推荐。您应该做的是编写一个正则表达式来匹配字符串,并使用Doc.char_span创建一个匹配。由于Matcher在令牌上工作,使用“14个字符,包括空格”这样的启发式方法无法合理地实现。同样,这种启发式也是一种黑客行为,执行起来会不稳定。
我怀疑你真正想要做的是找出什么是外化的,也就是说,找到动词的宾语。在这种情况下,您应该使用DependencyMatcher。下面是一个使用简单规则并合并名词块的示例:
import spacy
from spacy.matcher import DependencyMatcher
nlp = spacy.load("en_core_web_sm")
texts = ['children externalize their emotions through outward behavior',
'children externalize hidden emotions.',
'children externalize internalized emotions.',
'a child might externalize a hidden emotion through misbehavior',
'a kid might externalize some emotions through behavior',
'traumatized children externalize their hidden trauma through bad behavior.',
'The kid is externalizing internal traumas',
'A child might externalize emotions though his outward behavior',
'The kid externalized a lot of his emotions through misbehavior.']
pattern = [
{
"RIGHT_ID": "externalize",
"RIGHT_ATTRS": {"LEMMA": "externalize"}
},
{
"LEFT_ID": "externalize",
"REL_OP": ">",
"RIGHT_ID": "object",
"RIGHT_ATTRS": {"DEP": "dobj"}
},
]
matcher = DependencyMatcher(nlp.vocab)
matcher.add("EXTERNALIZE", [pattern])
# what was externalized?
# this is optional: merge noun phrases
nlp.add_pipe("merge_noun_chunks")
for doc in nlp.pipe(texts):
for match_id, tokens in matcher(doc):
# tokens[0] is like "externalize"
print(doc[tokens[1]])输出:
their emotions
hidden emotions
internalized emotions
a hidden emotion
some emotions
their hidden trauma
internal traumas
emotions
his outward behavior
a lothttps://stackoverflow.com/questions/69855625
复制相似问题