Java 正确迭代Stanford NLP树_Java_Stanford Nlp

Java 正确迭代Stanford NLP树

java stanford-nlp

Java 正确迭代Stanford NLP树,java,stanford-nlp,Java,Stanford Nlp,我的目标是找出给定的单词是介词还是从属连词。stanford parser的主要问题是上面提到的两个词类都有一个IN标记。因此，为了唯一地识别它们，我实施了以下步骤：我试图迭代斯坦福解析器生成的nlp树图像优先：在这里我试着这样做 if IN is found { parentValue = parent of IN if parentValue is SBAR { get leaf or child of IN ... (ie w

我的目标是找出给定的单词是介词还是从属连词。stanford parser的主要问题是上面提到的两个词类都有一个IN标记。因此，为了唯一地识别它们，我实施了以下步骤：

我试图迭代斯坦福解析器生成的nlp树

图像优先：

在这里我试着这样做

if IN is found
{
    parentValue = parent of IN

    if parentValue is SBAR
    {        
      get leaf or child of IN ... (ie word itself)
      mark it as subordinating conjunction
    }


    if parentValue is PP
    {        
      get leaf or child of IN ... (ie word itself)
      mark it as preposition
    }

}

为什么我要先签入

基本上，根据我的理解，如果一个句子有介词或从属连词，它要么分别属于PP，要么属于SBAR。但问题是，在中可能没有，作为儿童，它可以是另一个句子、NP或任何东西。因此，我首先开始在中查找。（欢迎提出建议和更正。）

另外，我在这里假设，在我将来遇到的任何句子中，下面的都不会有意外。如果我错了，请纠正我
我已经编写了以下代码

package com.test.olabs.main; import java.util.List; import com.olabs.nlp.OlabsTokenizer; import edu.stanford.nlp.ling.CoreLabel; import edu.stanford.nlp.parser.lexparser.LexicalizedParser; import edu.stanford.nlp.tagger.maxent.MaxentTagger; import edu.stanford.nlp.trees.Tree; public class MyTester { public static void main(String[] args) { MyTester t = new MyTester(); t.test(); } String sentence = "It seemed as if whole town was mourning his death."; private static final String ENG_BI_MODEL = "edu/stanford/nlp/models/pos-tagger/english-bidirectional/english-bidirectional-distsim.tagger"; private static final String PCG_MODEL = "edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz"; private static final MaxentTagger mxt = new MaxentTagger(ENG_BI_MODEL); private static final LexicalizedParser parser = LexicalizedParser .loadModel(PCG_MODEL); Tree parentNode = null; private void findPro(Tree t) { System.out.println("findpro tree value " + t.label().value()); if (t.label().value().equals("IN")) { System.out.println("-----------in IN"); if (parentNode.value().equals("PP")) { System.out.println("found prep " +t.label().value()); } if (parentNode.value().equals("SBAR")) { System.out.println("----------in sbar "+t.label().value()); } } else { for (Tree child : t.children()) { parentNode = t; // parent is t and childVar is child , we need // to store parent ... so we stored it findPro(child); } } } public Tree parse(String s) { List<CoreLabel> tokens = OlabsTokenizer.tokenizeString(s); mxt.tagCoreLabels(tokens); Tree tree = parser.apply(tokens); return tree; } void test() { MyTester test = new MyTester(); Tree t = test.parse(sentence); findPro(t); } }
基本上，循环两次进入，PP根本不打印。我认为它应该只输入一次，并以的形式输出，如果。这是斯坦福解析器中的错误还是我的代码中的错误
我怎样才能正确地完成这一切？需要帮助
仅供参考，我也试过它的第一部分但是没有太多帮助

findpro tree value ROOT findpro tree value S findpro tree value NP findpro tree value PRP findpro tree value It findpro tree value VP findpro tree value VBD findpro tree value seemed findpro tree value SBAR findpro tree value IN -----------in IN ----------in sbar IN findpro tree value IN -----------in IN ----------in sbar IN findpro tree value S findpro tree value NP findpro tree value JJ findpro tree value whole findpro tree value NN findpro tree value town findpro tree value VP findpro tree value VBD findpro tree value was findpro tree value VP findpro tree value VBG findpro tree value mourning findpro tree value NP findpro tree value PRP$ findpro tree value his findpro tree value NN findpro tree value death findpro tree value . findpro tree value .