我正在尝试从依赖分析器的输出生成一个树(嵌套字典)。句子是“我在睡梦中射杀了一头大象”。我能够获得链接:How do I do dependency parsing in NLTK?中描述的输出
nsubj(shot-2, I-1)
det(elephant-4, an-3)
dobj(shot-2, elephant-4)
prep(shot-2, in-5)
poss(sleep-7, my-6)
pobj(in-5, sleep-7)要将这个元组列表转换为嵌套字典,我使用了以下链接:How to convert python list of tuples into tree?
def build_tree(list_of_tuples):
all_nodes = {n[2]:((n[0], n[1]),{}) for n in list_of_tuples}
root = {}
print all_nodes
for item in list_of_tuples:
rel, gov,dep = item
if gov is not 'ROOT':
all_nodes[gov][1][dep] = all_nodes[dep]
else:
root[dep] = all_nodes[dep]
return root这使输出结果如下:
{'shot': (('ROOT', 'ROOT'),
{'I': (('nsubj', 'shot'), {}),
'elephant': (('dobj', 'shot'), {'an': (('det', 'elephant'), {})}),
'sleep': (('nmod', 'shot'),
{'in': (('case', 'sleep'), {}), 'my': (('nmod:poss', 'sleep'), {})})})}为了找到根到叶的路径,我使用了以下链接:Return root to specific leaf from a nested dictionary tree
创建树和找到路径是两个独立的thingsThe,第二个目标是找到根到叶节点的路径,就像做了Return root to specific leaf from a nested dictionary tree一样。但是我想得到根对叶(依赖关系路径),因此,例如,当我将调用recurse_category(类别,'an'),其中类别是嵌套的树结构,'an‘是在树中的单词,我应该得到ROOT-nsubj-dobj (依赖关系,直到根)作为输出。
发布于 2018-09-08 00:31:37
这会将输出转换为嵌套的字典表单。如果我能找到路径的话,我会随时通知你的。也许这个是有帮助的。
list_of_tuples = [('ROOT','ROOT', 'shot'),('nsubj','shot', 'I'),('det','elephant', 'an'),('dobj','shot', 'elephant'),('case','sleep', 'in'),('nmod:poss','sleep', 'my'),('nmod','shot', 'sleep')]
nodes={}
for i in list_of_tuples:
rel,parent,child=i
nodes[child]={'Name':child,'Relationship':rel}
forest=[]
for i in list_of_tuples:
rel,parent,child=i
node=nodes[child]
if parent=='ROOT':# this should be the Root Node
forest.append(node)
else:
parent=nodes[parent]
if not 'children' in parent:
parent['children']=[]
children=parent['children']
children.append(node)
print forest输出是一个嵌套字典,
[{'Name': 'shot', 'Relationship': 'ROOT', 'children': [{'Name': 'I', 'Relationship': 'nsubj'}, {'Name': 'elephant', 'Relationship': 'dobj', 'children': [{'Name': 'an', 'Relationship': 'det'}]}, {'Name': 'sleep', 'Relationship': 'nmod', 'children': [{'Name': 'in', 'Relationship': 'case'}, {'Name': 'my', 'Relationship': 'nmod:poss'}]}]}]
下面的函数可以帮助您找到根到叶路径:
def recurse_category(categories,to_find):
for category in categories:
if category['Name'] == to_find:
return True, [category['Relationship']]
if 'children' in category:
found, path = recurse_category(category['children'], to_find)
if found:
return True, [category['Relationship']] + path
return False, []发布于 2018-09-04 07:33:09
首先,如果您只是使用斯坦福CoreNLP依赖解析器的预训练模型,则应该使用来自nltk.parse.corenlp的CoreNLPDependencyParser并避免使用旧的nltk.parse.stanford接口。
下载并运行终端中的Java服务器后,使用Python:
>>> from nltk.parse.corenlp import CoreNLPDependencyParser
>>> dep_parser = CoreNLPDependencyParser(url='http://localhost:9000')
>>> sent = "I shot an elephant with a banana .".split()
>>> parses = list(dep_parser.parse(sent))
>>> type(parses[0])
<class 'nltk.parse.dependencygraph.DependencyGraph'>现在我们看到解析是DependencyGraph类型的,来自nltk.parse.dependencygraph https://github.com/nltk/nltk/blob/develop/nltk/parse/dependencygraph.py#L36。
通过简单地执行DependencyGraph将nltk.tree.Tree对象转换为DependencyGraph.tree()
>>> parses[0].tree()
Tree('shot', ['I', Tree('elephant', ['an']), Tree('banana', ['with', 'a']), '.'])
>>> parses[0].tree().pretty_print()
shot
_________|____________
| | elephant banana
| | | _____|_____
I . an with a 要将其转换为括号内的解析格式,请执行以下操作:
>>> print(parses[0].tree())
(shot I (elephant an) (banana with a) .)如果您正在寻找依赖三胞胎:
>>> [(governor, dep, dependent) for governor, dep, dependent in parses[0].triples()]
[(('shot', 'VBD'), 'nsubj', ('I', 'PRP')), (('shot', 'VBD'), 'dobj', ('elephant', 'NN')), (('elephant', 'NN'), 'det', ('an', 'DT')), (('shot', 'VBD'), 'nmod', ('banana', 'NN')), (('banana', 'NN'), 'case', ('with', 'IN')), (('banana', 'NN'), 'det', ('a', 'DT')), (('shot', 'VBD'), 'punct', ('.', '.'))]
>>> for governor, dep, dependent in parses[0].triples():
... print(governor, dep, dependent)
...
('shot', 'VBD') nsubj ('I', 'PRP')
('shot', 'VBD') dobj ('elephant', 'NN')
('elephant', 'NN') det ('an', 'DT')
('shot', 'VBD') nmod ('banana', 'NN')
('banana', 'NN') case ('with', 'IN')
('banana', 'NN') det ('a', 'DT')
('shot', 'VBD') punct ('.', '.')以CONLL格式:
>>> print(parses[0].to_conll(style=10))
1 I I PRP PRP _ 2 nsubj _ _
2 shot shoot VBD VBD _ 0 ROOT _ _
3 an a DT DT _ 4 det _ _
4 elephant elephant NN NN _ 2 dobj _ _
5 with with IN IN _ 7 case _ _
6 a a DT DT _ 7 det _ _
7 banana banana NN NN _ 2 nmod _ _
8 . . . . _ 2 punct _ _https://stackoverflow.com/questions/52148690
复制相似问题