首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >如何从依赖关系解析器的输出生成树?

如何从依赖关系解析器的输出生成树?
EN

Stack Overflow用户
提问于 2018-09-03 11:17:06
回答 2查看 3.6K关注 0票数 4

我正在尝试从依赖分析器的输出生成一个树(嵌套字典)。句子是“我在睡梦中射杀了一头大象”。我能够获得链接:How do I do dependency parsing in NLTK?中描述的输出

代码语言:javascript
复制
nsubj(shot-2, I-1)
det(elephant-4, an-3)
dobj(shot-2, elephant-4)
prep(shot-2, in-5)
poss(sleep-7, my-6)
pobj(in-5, sleep-7)

要将这个元组列表转换为嵌套字典,我使用了以下链接:How to convert python list of tuples into tree?

代码语言:javascript
复制
def build_tree(list_of_tuples):
    all_nodes = {n[2]:((n[0], n[1]),{}) for n in list_of_tuples}
    root = {}    
    print all_nodes
    for item in list_of_tuples:
        rel, gov,dep = item
        if gov is not 'ROOT':
            all_nodes[gov][1][dep] = all_nodes[dep]
        else:
            root[dep] = all_nodes[dep]
    return root

这使输出结果如下:

代码语言:javascript
复制
{'shot': (('ROOT', 'ROOT'),
  {'I': (('nsubj', 'shot'), {}),
   'elephant': (('dobj', 'shot'), {'an': (('det', 'elephant'), {})}),
   'sleep': (('nmod', 'shot'),
    {'in': (('case', 'sleep'), {}), 'my': (('nmod:poss', 'sleep'), {})})})}

为了找到根到叶的路径,我使用了以下链接:Return root to specific leaf from a nested dictionary tree

创建树和找到路径是两个独立的thingsThe,第二个目标是找到根到叶节点的路径,就像做了Return root to specific leaf from a nested dictionary tree一样。但是我想得到根对叶(依赖关系路径),因此,例如,当我将调用recurse_category(类别,'an'),其中类别是嵌套的树结构,'an‘是在树中的单词,我应该得到ROOT-nsubj-dobj (依赖关系,直到根)作为输出。

EN

回答 2

Stack Overflow用户

回答已采纳

发布于 2018-09-08 00:31:37

这会将输出转换为嵌套的字典表单。如果我能找到路径的话,我会随时通知你的。也许这个是有帮助的。

代码语言:javascript
复制
list_of_tuples = [('ROOT','ROOT', 'shot'),('nsubj','shot', 'I'),('det','elephant', 'an'),('dobj','shot', 'elephant'),('case','sleep', 'in'),('nmod:poss','sleep', 'my'),('nmod','shot', 'sleep')]

nodes={}

for i in list_of_tuples:
    rel,parent,child=i
    nodes[child]={'Name':child,'Relationship':rel}

forest=[]

for i in list_of_tuples:
    rel,parent,child=i
    node=nodes[child]

    if parent=='ROOT':# this should be the Root Node
            forest.append(node)
    else:
        parent=nodes[parent]
        if not 'children' in parent:
            parent['children']=[]
        children=parent['children']
        children.append(node)

print forest

输出是一个嵌套字典,

[{'Name': 'shot', 'Relationship': 'ROOT', 'children': [{'Name': 'I', 'Relationship': 'nsubj'}, {'Name': 'elephant', 'Relationship': 'dobj', 'children': [{'Name': 'an', 'Relationship': 'det'}]}, {'Name': 'sleep', 'Relationship': 'nmod', 'children': [{'Name': 'in', 'Relationship': 'case'}, {'Name': 'my', 'Relationship': 'nmod:poss'}]}]}]

下面的函数可以帮助您找到根到叶路径:

代码语言:javascript
复制
def recurse_category(categories,to_find):
    for category in categories: 
        if category['Name'] == to_find:
            return True, [category['Relationship']]
        if 'children' in category:
            found, path = recurse_category(category['children'], to_find)
            if found:
                return True, [category['Relationship']] + path
    return False, []
票数 0
EN

Stack Overflow用户

发布于 2018-09-04 07:33:09

首先,如果您只是使用斯坦福CoreNLP依赖解析器的预训练模型,则应该使用来自nltk.parse.corenlpCoreNLPDependencyParser并避免使用旧的nltk.parse.stanford接口。

请参阅Stanford Parser and NLTK

下载并运行终端中的Java服务器后,使用Python:

代码语言:javascript
复制
>>> from nltk.parse.corenlp import CoreNLPDependencyParser
>>> dep_parser = CoreNLPDependencyParser(url='http://localhost:9000')
>>> sent = "I shot an elephant with a banana .".split()
>>> parses = list(dep_parser.parse(sent))
>>> type(parses[0])
<class 'nltk.parse.dependencygraph.DependencyGraph'>

现在我们看到解析是DependencyGraph类型的,来自nltk.parse.dependencygraph https://github.com/nltk/nltk/blob/develop/nltk/parse/dependencygraph.py#L36

通过简单地执行DependencyGraphnltk.tree.Tree对象转换为DependencyGraph.tree()

代码语言:javascript
复制
>>> parses[0].tree()
Tree('shot', ['I', Tree('elephant', ['an']), Tree('banana', ['with', 'a']), '.'])

>>> parses[0].tree().pretty_print()
          shot                  
  _________|____________         
 |   |  elephant      banana    
 |   |     |       _____|_____   
 I   .     an    with         a 

要将其转换为括号内的解析格式,请执行以下操作:

代码语言:javascript
复制
>>> print(parses[0].tree())
(shot I (elephant an) (banana with a) .)

如果您正在寻找依赖三胞胎:

代码语言:javascript
复制
>>> [(governor, dep, dependent) for governor, dep, dependent in parses[0].triples()]
[(('shot', 'VBD'), 'nsubj', ('I', 'PRP')), (('shot', 'VBD'), 'dobj', ('elephant', 'NN')), (('elephant', 'NN'), 'det', ('an', 'DT')), (('shot', 'VBD'), 'nmod', ('banana', 'NN')), (('banana', 'NN'), 'case', ('with', 'IN')), (('banana', 'NN'), 'det', ('a', 'DT')), (('shot', 'VBD'), 'punct', ('.', '.'))]

>>> for governor, dep, dependent in parses[0].triples():
...     print(governor, dep, dependent)
... 
('shot', 'VBD') nsubj ('I', 'PRP')
('shot', 'VBD') dobj ('elephant', 'NN')
('elephant', 'NN') det ('an', 'DT')
('shot', 'VBD') nmod ('banana', 'NN')
('banana', 'NN') case ('with', 'IN')
('banana', 'NN') det ('a', 'DT')
('shot', 'VBD') punct ('.', '.')

以CONLL格式:

代码语言:javascript
复制
>>> print(parses[0].to_conll(style=10))
1   I   I   PRP PRP _   2   nsubj   _   _
2   shot    shoot   VBD VBD _   0   ROOT    _   _
3   an  a   DT  DT  _   4   det _   _
4   elephant    elephant    NN  NN  _   2   dobj    _   _
5   with    with    IN  IN  _   7   case    _   _
6   a   a   DT  DT  _   7   det _   _
7   banana  banana  NN  NN  _   2   nmod    _   _
8   .   .   .   .   _   2   punct   _   _
票数 2
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/52148690

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档