首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >将嵌套括号树转换为嵌套列表

将嵌套括号树转换为嵌套列表
EN

Stack Overflow用户
提问于 2014-04-12 02:10:55
回答 2查看 1.6K关注 0票数 4

我有一个树结构文件,其中括号用于表示树。下面是将相同代码转换为python嵌套列表的代码

代码语言:javascript
复制
def foo(s):
    def foo_helper(level=0):
        try:
            token = next(tokens)
        except StopIteration:
            if level != 0:
                raise Exception('missing closing paren')
            else:
                return []
        if token == ')':
            if level == 0:
                raise Exception('missing opening paren')
            else:
                return []
        elif token == '(':
            return [foo_helper(level+1)] + foo_helper(level)
        else:
            return [token] + foo_helper(level)
    tokens = iter(s)
    return foo_helper()    

正如how to parse a strign and return nested array给出的。

在这里,当字符长度为1时,它工作得很好。对于单词或句子来说,同样的方法是不正确的。我的树样本是:

代码语言:javascript
复制
( Satellite (span 69 74) (rel2par Elaboration)
        ( Nucleus (span 69 72) (rel2par span)
          ( Nucleus (span 69 70) (rel2par span)
            ( Nucleus (leaf 69) (rel2par span) (text _!MERRILL LYNCH READY ASSETS TRUST :_!) )
            ( Satellite (leaf 70) (rel2par Elaboration) (text _!8.65 % ._!) )
          )
          ( Satellite (span 71 72) (rel2par Elaboration)
            ( Nucleus (leaf 71) (rel2par span) (text _!Annualized average rate of return_!) )
            ( Satellite (leaf 72) (rel2par Temporal) (text _!after expenses for the past 30 days ;_!) )
          )
        )
        ( Satellite (span 73 74) (rel2par Elaboration)
          ( Nucleus (leaf 73) (rel2par span) (text _!not a forecast_!) )
          ( Satellite (leaf 74) (rel2par Elaboration) (text _!of future returns ._!) )
        )
      )

在这里,输出需要是['satellite',['span','69','74'].........],但是对于给定的函数,我得到的是['s','a','t'...............['s','p','a','n','7','3']..............]

这是如何修改的?

EN

回答 2

Stack Overflow用户

回答已采纳

发布于 2014-04-12 02:35:18

我以为您希望使用_!表示字符串和空格。然后,我使用正则表达式拆分表达式:

代码语言:javascript
复制
from re import compile
resexp = compile(r'([()]|_!)')
…
  tokens = iter(resexp.split(s))
…

我的结果是(与depth=4一起使用pprint )

代码语言:javascript
复制
$ python lispparse.py  | head
['\n',
 [' Satellite ',
  ['span 69 74'],
  ' ',
  ['rel2par Elaboration'],
  '\n        ',
  [' Nucleus ',
   ['span 69 72'],
   ' ',
   ['rel2par span'],

我对它做了进一步改进,包括:

代码语言:javascript
复制
tokens = iter(filter(None, (i.strip() for i in resexp.split(s))))

并得到:

代码语言:javascript
复制
$ python lispparse.py  
[['Satellite',
  ['span 69 74'],
  ['rel2par Elaboration'],
  ['Nucleus',
   ['span 69 72'],
   ['rel2par span'],
   ['Nucleus', [...], [...], [...], [...]],
   ['Satellite', [...], [...], [...], [...]]],
  ['Satellite',
   ['span 73 74'],
   ['rel2par Elaboration'],
   ['Nucleus', [...], [...], [...]],
   ['Satellite', [...], [...], [...]]]]]
票数 1
EN

Stack Overflow用户

发布于 2014-04-12 02:23:02

您的目的不是在字符串本身上调用此函数,而是通过一个令牌列表(即字符串split )调用该函数。

代码语言:javascript
复制
def parse(s):
    def parse_helper(level=0):
        try:
            token = next(tokens)
        except StopIteration:
            if level:
                raise Exception('Missing close paren')
            else:
                return []
        if token == ')':
            if not level:
                raise Exception('Missing open paren')
            else:
                return []
        elif token == '(':
            return [parse_helper(level+1)] + parse_helper(level)
        else:
            return [token] + parse_helper(level)
    tokens = iter(s)
    return parse_helper()

if __name__ == '__main__':
    with open('tree.thing', 'r') as treefile:
        tree = treefile.read()

    print(parse(tree.split()))

treefile包含您发布的数据结构的地方,我得到了以下输出:

代码语言:javascript
复制
[['Satellite', '(span', '69', '74)', '(rel2par', 'Elaboration)', ['Nucleus', '(span', '69', '72)', '(rel2par', 'span)', ['Nucleus', '(span', '69', '70)', '(rel2par', 'span)', ['Nucleus', '(leaf', '69)', '(rel2par', 'span)', '(text', '_!MERRILL', 'LYNCH', 'READY', 'ASSETS', 'TRUST', ':_!)'], ['Satellite', '(leaf', '70)', '(rel2par', 'Elaboration)', '(text', '_!8.65', '%', '._!)']], ['Satellite', '(span', '71', '72)', '(rel2par', 'Elaboration)', ['Nucleus', '(leaf', '71)', '(rel2par', 'span)', '(text', '_!Annualized', 'average', 'rate', 'of', 'return_!)'], ['Satellite', '(leaf', '72)', '(rel2par', 'Temporal)', '(text', '_!after', 'expenses', 'for', 'the', 'past', '30', 'days', ';_!)']]], ['Satellite', '(span', '73', '74)', '(rel2par', 'Elaboration)', ['Nucleus', '(leaf', '73)', '(rel2par', 'span)', '(text', '_!not', 'a', 'forecast_!)'], ['Satellite', '(leaf', '74)', '(rel2par', 'Elaboration)', '(text', '_!of', 'future', 'returns', '._!)']]]]
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/23025282

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档