首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >斯坦福大学NLP分析师。如何劈开这棵树?

斯坦福大学NLP分析师。如何劈开这棵树?
EN

Stack Overflow用户
提问于 2014-06-24 09:08:06
回答 1查看 902关注 0票数 1

如果我以主页为例

代码语言:javascript
复制
The strongest rain ever recorded in India shut down 
the financial hub of Mumbai, snapped communication 
lines, closed airports and forced thousands of people 
to sleep in their offices or walk home during the night, 
officials said today.

斯坦福解析器:

代码语言:javascript
复制
LexicalizedParser lexicalizedParser = LexicalizedParser.loadModel("edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz");

Tree parse = lexicalizedParser.parse(text);
TreePrint treePrint = new TreePrint("penn, typedDependencies");

treePrint.printTree(parse);

交付折叠树:

代码语言:javascript
复制
(ROOT
(S
  (S
    (NP
      (NP (DT The) (JJS strongest) (NN rain))
      (VP
        (ADVP (RB ever))
        (VBN recorded)
        (PP (IN in)
          (NP (NNP India)))))
    (VP
      (VP (VBD shut)
        (PRT (RP down))
        (NP
          (NP (DT the) (JJ financial) (NN hub))
          (PP (IN of)
            (NP (NNP Mumbai)))))
      (, ,)
      (VP (VBD snapped)
        (NP (NN communication) (NNS lines)))
      (, ,)
      (VP (VBD closed)
        (NP (NNS airports)))
      (CC and)
      (VP (VBD forced)
        (NP
          (NP (NNS thousands))
          (PP (IN of)
            (NP (NNS people))))
        (S
          (VP (TO to)
            (VP
              (VP (VB sleep)
                (PP (IN in)
                  (NP (PRP$ their) (NNS offices))))
              (CC or)
              (VP (VB walk)
                (NP (NN home))
                (PP (IN during)
                  (NP (DT the) (NN night))))))))))
  (, ,)
  (NP (NNS officials))
  (VP (VBD said)
    (NP-TMP (NN today)))
  (. .)))

现在,我想拆分依赖于其结构的树,以获得子句。因此,在本例中,我希望拆分树以获得以下部分:

  • 印度有史以来最强的雨
  • 最强的雨使孟买的金融中心关闭。
  • 最强的雨折断了通讯线路。
  • 暴雨最强烈的机场关闭
  • 最强的雨迫使成千上万的人睡在办公室里。
  • 最强的雨迫使成千上万的人在夜间步行回家。

我怎么能这么做?

因此,第一个答案是使用递归算法打印所有根到叶路径。

下面是我尝试过的代码:

代码语言:javascript
复制
public static void main(String[] args) throws IOException {
    LexicalizedParser lexicalizedParser = LexicalizedParser.loadModel("edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz");

    Tree tree = lexicalizedParser.parse("In a ceremony that was conspicuously short on pomp and circumstance at a time of austerity, Felipe, 46, took over from his father, King Juan Carlos, 76.");

    printAllRootToLeafPaths(tree, new ArrayList<String>());
}

private static void printAllRootToLeafPaths(Tree tree, List<String> path) {
    if(tree != null) {
        if(tree.isLeaf()) {
            path.add(tree.nodeString());
        }

        if(tree.children().length == 0) {
            System.out.println(path);
        } else {
            for(Tree child : tree.children()) {
                printAllRootToLeafPaths(child, path);
            }
        }

        path.remove(tree.nodeString());
    }
}

当然,这段代码是完全不合逻辑的,因为如果我只是将leafs添加到路径中,就不会有递归调用,因为leafs没有子调用。这里的问题是,所有真实的单词都是叶字,所以这个算法只需打印出一个单词,即叶字:

代码语言:javascript
复制
[The]
[strongest]
[rain]
[ever]
[recorded]
[in]
[India]
[shut]
[down]
[the]
[financial]
[hub]
[of]
[Mumbai]
[,]
[snapped]
[communication]
[lines]
[,]
[closed]
[airports]
[and]
[forced]
[thousands]
[of]
[people]
[to]
[sleep]
[in]
[their]
[offices]
[or]
[walk]
[home]
[during]
[the]
[night]
[,]
[officials]
[said]
[today]
[.]
EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2014-06-24 09:25:57

查看一下在二叉树中打印所有根到叶路径或分割二叉树:

  • http://math-puzzles-computing.blogspot.nl/2011/02/splitting-binary-search-tree-at-given.html
  • http://www.cs.cmu.edu/afs/cs/academic/class/15210-f11/www/lectures/18/lecture18.pdf
  • http://digital.cs.usu.edu/~allan/DS/Notes/Ch19.pdf
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/24382581

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档