Java 7 斯坦福大学CorpNLP返回错误结果_Java 7_Stanford Nlp_Eclipse 3.4_Lemmatization

Java 7 斯坦福大学CorpNLP返回错误结果

stanford-nlp

Java 7 斯坦福大学CorpNLP返回错误结果,java-7,stanford-nlp,eclipse-3.4,lemmatization,Java 7,Stanford Nlp,Eclipse 3.4,Lemmatization,我正在尝试用斯坦福大学的corenlp对以下问题进行柠檬化。我的环境是：- Java 1.7 Eclipse3.4.0 StandfordCoreNLP版本3.4.1（）我的代码片段是：- //...........lemmatization starts........................ Properties props = new Properties(); props.put("annotators", "tokenize, ssplit, pos,

我正在尝试用斯坦福大学的corenlp对以下问题进行柠檬化。我的环境是：-

Java 1.7
Eclipse3.4.0
StandfordCoreNLP版本3.4.1（）

我的代码片段是：-

//...........lemmatization starts........................

    Properties props = new Properties(); 
    props.put("annotators", "tokenize, ssplit, pos, lemma"); 
    StanfordCoreNLP pipeline = new StanfordCoreNLP(props, false);
    String text = "painting"; 
    Annotation document = pipeline.process(text);  

    List<edu.stanford.nlp.util.CoreMap> sentences = document.get(SentencesAnnotation.class);

    for(edu.stanford.nlp.util.CoreMap sentence: sentences) 

    {    
        for(CoreLabel token: sentence.get(TokensAnnotation.class))
        {       
            String word = token.get(TextAnnotation.class);      
            String lemma = token.get(LemmaAnnotation.class); 
            System.out.println("lemmatized version :" + lemma);
        }
    }

    //...........lemmatization ends.........................

我期待的地方

lemmatized version :paint

请告诉我。

本例中的问题是，单词绘画可以是to paint的当前分词或名词，lemmatizer的输出取决于分配给原始单词的词性标记

如果只在片段绘画上运行标记器，则没有上下文可以帮助标记器（或人类）决定如何标记单词。在本例中，它选择了标签

NN

，名词painting的引理实际上是painting

如果你用“我正在画一朵花”这句话运行相同的代码，那么标记者应该正确地将绘画标记为

VBG

，柠檬化工应该返回绘画。

没关系。但是如果我有像“绘画”这样的词，而我需要从中汲取“绘画”呢。我应该使用什么其他api/工具？我无法向API发送句子。如果标签取决于上下文，则没有任何工具能够基于单个单词推断正确的POS标签。但是，如果您事先知道单词将是动词，则可以手动标记它，然后运行lemmatizer.Sebastian。pl帮助我手动标记单词。我在netSee上找不到任何代码

lemmatized version :paint