Java 如何使用西班牙语中的斯坦福NLP词性标注？_Java_Nlp_Stanford Nlp_Part Of Speech

Java 如何使用西班牙语中的斯坦福NLP词性标注？

java nlp stanford-nlp

Java 如何使用西班牙语中的斯坦福NLP词性标注？,java,nlp,stanford-nlp,part-of-speech,Java,Nlp,Stanford Nlp,Part Of Speech,我和斯坦福大学的CoreNLP合作，我有一个疑问。我想确定每个单词的语法类别，以及在命令行中执行文本时使用： java -cp "*" -Xmx2g edu.stanford.nlp.pipeline.StanfordCoreNLP -props StanfordCoreNLP-spanish.properties -annotators tokenize,ssplit,pos, ner -file entrada.txt -outputFormat conll 输出如下所示： 1

我和斯坦福大学的CoreNLP合作，我有一个疑问。我想确定每个单词的语法类别，以及在命令行中执行文本时使用：

java -cp "*" -Xmx2g edu.stanford.nlp.pipeline.StanfordCoreNLP -props StanfordCoreNLP-spanish.properties  -annotators tokenize,ssplit,pos, ner  -file entrada.txt -outputFormat conll

输出如下所示：

1       tomar   _       VERB    _       _       _
2       una     _       DET     _       _       _
3       cerveza _       NOUN    _       _       _
4       en      _       ADP     _       _       _
5       Madrid  _       PROPN   _       _       _

但当我使用以下代码从NetBeans执行时：

Properties props = new Properties();
        props.setProperty("annotators", "tokenize, ssplit, pos, lemma, ner");
        props.setProperty("tokenize.language", "es");
        props.setProperty("pos.model", "edu/stanford/nlp/models/pos-tagger/spanish/spanish-distsim.tagger");
        props.setProperty("ner.model", "edu/stanford/nlp/models/ner/spanish.ancora.distsim.s512.crf.ser.gz");
        props.setProperty("ner.applyNumericClassifiers", "true");
        props.setProperty("ner.useSUTime", "false");
        props.setProperty("ner.applyFineGrained", "false");
        props.setProperty("ner.language", "es");

        String text = "Ver una película de miedo, pasear por un parque";
        StanfordCoreNLP pipeline = new StanfordCoreNLP(props);

        Annotation document = new Annotation(text);


        pipeline.annotate(document);
        List<CoreMap> sentences = document.get(SentencesAnnotation.class);
        for(CoreMap sentence: sentences) {
            for (CoreLabel token: sentence.get(TokensAnnotation.class)) {
              String g = token.tag();
              String word = token.get(TextAnnotation.class);
              String pos = token.get(PartOfSpeechAnnotation.class);
              String ne = token.get(NamedEntityTagAnnotation.class);
              String lema = token.get(LemmaAnnotation.class);


              System.out.println(String.format("[%s] "
                      + "[%s] "
                      + "[%s] "
                      + "[%s] " , word, pos, ne, lema));
            }
        }

那么，如何转换“Verb”中的“vmn0000”这样的标记呢

提前谢谢你

确保对词性使用斯坦福CoreNLP 3.9.2和UD模型

edu/stanford/nlp/models/pos-tagger/spanish/spanish-ud.tagger

我不知道，也不会说西班牙语，但我认为删除

引理和NER属性可能会解决问题
edu/stanford/nlp/models/pos-tagger/spanish/spanish-ud.tagger