使用Python,我试图从多个主语被动语态句中提取实体。
“约翰和珍妮被大卫指控犯罪”
我的意图是从句子中抽取“约翰和珍妮”作为nsubjpass和_.ent__。
然而,我只能将“John”提取为nsubjpass。
如何把它们都提取出来?
注意,虽然在.ents中发现John是一个实体,但Jenny被认为是conj而不是nsubjpass。如何改进?
代码
each_sentence3 = "John and Jenny were accused of crimes by David"
doc=nlp(each_sentence3)
passive_toks=[tok for tok in doc if (tok.dep_ == "nsubjpass") ]
if passive_toks != []:
print(passive_toks)结果:
[John]实体清单显示:
代码
`
print(list(doc.ents)结果
[John, Jenny, David]现在,如果我们检查整个句子,我们看到如下:
代码:
for tok in doc:
print(tok, tok.dep_)结果
John nsubjpass
and cc
Jenny conj
were auxpass
accused ROOT
of prep
crimes pobj
by agent
David pobj注意,第二个被动主语Jenny在Spacy中被识别为conj,而不是nsubjpass。
发布于 2017-02-14 08:02:34
下面是一个使用POS标记和依赖分析来提取主题及其所有连接的示例。
还有一个Token.conjuncts属性,但是它只能直接连接到令牌。请参阅https://github.com/explosion/spaCy/issues/795
each_sentence3 = "John and Jenny were accused of crimes by David"
sent = nlp(each_sentence3)
result = []
subj = None
for word in sent:
if 'subj' in word.dep_:
subj = word
result.append(word)
elif word.dep_ == 'conj' and word.head == subj:
result.append(word)
print str(result)
[John, Jenny]https://stackoverflow.com/questions/41208346
复制相似问题