Java 斯坦福NLP解析器。如何劈开这棵树?

Java 斯坦福NLP解析器。如何劈开这棵树?,java,stanford-nlp,Java,Stanford Nlp,如果我从下面的例子: 斯坦福解析器: LexicalizedParser lexicalizedParser = LexicalizedParser.loadModel("edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz"); Tree parse = lexicalizedParser.parse(text); TreePrint treePrint = new TreePrint("penn, typedDependencies");

如果我从下面的例子:

斯坦福解析器:

LexicalizedParser lexicalizedParser = LexicalizedParser.loadModel("edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz");

Tree parse = lexicalizedParser.parse(text);
TreePrint treePrint = new TreePrint("penn, typedDependencies");

treePrint.printTree(parse);
交付以下树:

(ROOT
(S
  (S
    (NP
      (NP (DT The) (JJS strongest) (NN rain))
      (VP
        (ADVP (RB ever))
        (VBN recorded)
        (PP (IN in)
          (NP (NNP India)))))
    (VP
      (VP (VBD shut)
        (PRT (RP down))
        (NP
          (NP (DT the) (JJ financial) (NN hub))
          (PP (IN of)
            (NP (NNP Mumbai)))))
      (, ,)
      (VP (VBD snapped)
        (NP (NN communication) (NNS lines)))
      (, ,)
      (VP (VBD closed)
        (NP (NNS airports)))
      (CC and)
      (VP (VBD forced)
        (NP
          (NP (NNS thousands))
          (PP (IN of)
            (NP (NNS people))))
        (S
          (VP (TO to)
            (VP
              (VP (VB sleep)
                (PP (IN in)
                  (NP (PRP$ their) (NNS offices))))
              (CC or)
              (VP (VB walk)
                (NP (NN home))
                (PP (IN during)
                  (NP (DT the) (NN night))))))))))
  (, ,)
  (NP (NNS officials))
  (VP (VBD said)
    (NP-TMP (NN today)))
  (. .)))
现在,我想将依赖于其结构的树拆分为子句。 因此,在本例中,我希望拆分树以获得以下部分:

  • 印度有史以来最强的降雨
  • 这场最强的雨关闭了孟买的金融中心
  • 最强的雨切断了通讯线路
  • 最强的雨使机场关闭
  • 大雨迫使数千人睡在办公室里
  • 强降雨迫使数千人在夜间步行回家
我该怎么做?
因此,第一个答案是使用递归算法打印所有 从根到叶的路径

以下是我尝试的代码:

public static void main(String[] args) throws IOException {
    LexicalizedParser lexicalizedParser = LexicalizedParser.loadModel("edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz");

    Tree tree = lexicalizedParser.parse("In a ceremony that was conspicuously short on pomp and circumstance at a time of austerity, Felipe, 46, took over from his father, King Juan Carlos, 76.");

    printAllRootToLeafPaths(tree, new ArrayList<String>());
}

private static void printAllRootToLeafPaths(Tree tree, List<String> path) {
    if(tree != null) {
        if(tree.isLeaf()) {
            path.add(tree.nodeString());
        }

        if(tree.children().length == 0) {
            System.out.println(path);
        } else {
            for(Tree child : tree.children()) {
                printAllRootToLeafPaths(child, path);
            }
        }

        path.remove(tree.nodeString());
    }
}
查看或拆分二叉树:


Stanford edu.Stanford.nlp.trees.Tree不是二叉树,但当然,您仍然可以使用这种递归算法打印包含从根到叶的所有可能组合的树。这里的问题是,节点不是单词。节点是标记,单词总是叶子。因此,你只需把所有节点作为标签,而路径(叶子)的最后一个节点就是一个单词。因此,如果你使用这个算法,你会得到:最强、最强的雨……你可以把名词短语和动词短语看作根节点。您还可以通过CC或令牌拆分VP。您正在使用
tree.children().length
检查是否对递归进行新的旋转。我建议检查NP/VP和CC。你能把你的示例代码上传到某个地方吗,这样我就可以启动env了?你是说这个吗?
public static void main(String[] args) throws IOException {
    LexicalizedParser lexicalizedParser = LexicalizedParser.loadModel("edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz");

    Tree tree = lexicalizedParser.parse("In a ceremony that was conspicuously short on pomp and circumstance at a time of austerity, Felipe, 46, took over from his father, King Juan Carlos, 76.");

    printAllRootToLeafPaths(tree, new ArrayList<String>());
}

private static void printAllRootToLeafPaths(Tree tree, List<String> path) {
    if(tree != null) {
        if(tree.isLeaf()) {
            path.add(tree.nodeString());
        }

        if(tree.children().length == 0) {
            System.out.println(path);
        } else {
            for(Tree child : tree.children()) {
                printAllRootToLeafPaths(child, path);
            }
        }

        path.remove(tree.nodeString());
    }
}
[The]
[strongest]
[rain]
[ever]
[recorded]
[in]
[India]
[shut]
[down]
[the]
[financial]
[hub]
[of]
[Mumbai]
[,]
[snapped]
[communication]
[lines]
[,]
[closed]
[airports]
[and]
[forced]
[thousands]
[of]
[people]
[to]
[sleep]
[in]
[their]
[offices]
[or]
[walk]
[home]
[during]
[the]
[night]
[,]
[officials]
[said]
[today]
[.]